fhswf · detlefarend · Jun 4, 2024 · Jun 4, 2024 · Jun 4, 2024 · Jun 5, 2024
diff --git a/doc/rtd/content/03_machine_learning/mlpro_gt/sub/dg/gameboard/rlenvs.rst b/doc/rtd/content/03_machine_learning/mlpro_gt/sub/dg/gameboard/rlenvs.rst
@@ -21,7 +21,7 @@ Reusing RL Environments
 
     Alternatively, if your environment follows Gym or PettingZoo interface, you can apply our
     relevant useful wrappers for the integration between third-party packages and MLPro.
-    For more information about the available third-party packages, please click :ref:`here<target-package-third>`.
+    For more information about the available third-party packages, please click :ref:`here<target_extension_hub>`.
     Then, you need to transfer the wrapped RL environment to a GT Game Board.
 
 

diff --git a/doc/rtd/content/03_machine_learning/mlpro_gt/sub/gettingstarted/02_dg.rst b/doc/rtd/content/03_machine_learning/mlpro_gt/sub/gettingstarted/02_dg.rst
@@ -55,8 +55,8 @@ After following the below step-by-step guideline, we expect the user understands
    After following the previous steps, we hope that you could practice MLPro-GT and start using this subpackage for your GT-related activities.
    For more advanced features, we highly recommend you to check out the following howto files:
 
-   (a) :ref:`Howto RL-HT-001: Hyperopt <Howto RL HT 001>`
+   (a) `Howto RL-HT-001: Hyperparameter Tuning using Hyperopt <https://mlpro-int-hyperopt.readthedocs.io/en/latest/content/01_examples_pool/howto.rl.ht.001.html>`_
 
-   (b) :ref:`Howto RL-HT-002: Optuna <Howto RL HT 002>`
+   (b) `Howto RL-HT-001: Hyperparameter Tuning using Optuna <https://mlpro-int-optuna.readthedocs.io/en/latest/content/01_examples_pool/howto.rl.ht.002.html>`_
 
-   (c) :ref:`Howto RL-ATT-001: Stagnation Detection <Howto RL ATT 001>`
+   (c) `Howto RL-ATT-001: Train and Reload Single Agent using Stagnation Detection (Gymnasium) <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/03_howtos_att/howto_rl_att_001_train_and_reload_single_agent_gym_sd.html>`_
diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/02_getstarted.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/02_getstarted.rst
@@ -44,17 +44,17 @@ After following the below step-by-step guideline, we expect the user understands
 
    (a) :ref:`Howto RL-001: Reward <Howto RL 001>`
 
-   (b) :ref:`Howto RL-AGENT-001: Run an Agent with Own Policy <Howto Agent RL 001>`
+   (b) `Howto RL-AGENT-001: Run an Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_001_run_agent_with_own_policy_on_gym_environment.html>`_
 
 **5. Understanding Agent in MLPro-RL**
    In reinforcement learning, we have two types of agents, such as a single-agent RL or a multi-agent RL. Both of the types are covered by MLPro-RL.
    To understand the different possibilities of an agent in MLPro, you can visit :ref:`this page <target_agents_RL>`.
 
    Then, you need to understand how to set up a single-agent and a multi-agent RL in MLPro-RL by following these examples:
 
-   (a) :ref:`Howto RL-AGENT-001: Run an Agent with Own Policy <Howto Agent RL 001>`
+   (a) `Howto RL-AGENT-001: Run an Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_001_run_agent_with_own_policy_on_gym_environment.html>`_
 
-   (b) :ref:`Howto RL-AGENT-003: Run Multi-Agent with Own Policy <Howto Agent RL 003>`
+   (b) `Howto RL-AGENT-003: Run Multi-Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_003_run_multiagent_with_own_policy_on_multicartpole_environment.html>`_
 
 **6. Selecting between Model-Free and Model-Based RL**
    In this section, you need to select your direction of the RL training, whether it is a model-free RL or a model-based RL.
@@ -66,36 +66,32 @@ After following the below step-by-step guideline, we expect the user understands
 
       (a) `A sample application video of MLPro-RL on a UR5 robot <https://ars.els-cdn.com/content/image/1-s2.0-S2665963822001051-mmc2.mp4>`_
 
-      (b) :ref:`Howto RL-AGENT-002: Train an Agent with Own Policy <Howto Agent RL 002>`
+      (b) `Howto RL-AGENT-002: Train an Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_002_train_agent_with_own_policy_on_gym_environment.html>`_
 
-      (c) :ref:`Howto RL-AGENT-004: Train Multi-Agent with Own Policy <Howto Agent RL 004>`
+      (c) `Howto RL-AGENT-004: Train Multi-Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_004_train_multiagent_with_own_policy_on_multicartpole_environment.html>`_
 
    * Model-Based Reinforcement Learning
 
       Model-based RL contains two learning paradigms, such as learning the environment (model-based learning) and utilizing the model (e.g. as an action planner).
       To practice model-based RL in the MLPro-RL package, here are a howto file that can be followed:
 
-      (a) :ref:`Howto RL-MB-001: Train and Reload Model Based Agent (Gym) <Howto MB RL 001>`
+      (a) `Howto RL-MB-001: Train and Reload Model Based Agent (Gymnasium) <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/04_howtos_mb/howto_rl_mb_001_train_and_reload_model_based_agent_gym%20copy.html>`_
 
-      (b) :ref:`Howto RL-MB-002: MBRL with MPC on Grid World Environment <Howto MB RL 002>`
+      (b) :ref:`Howto RL-MB-001: MBRL with MPC on Grid World Environment <Howto MB RL 001>`
 
       For more advanced MBRL technique, e.g. applying a native MBRL network, here is an example that can be used as a reference:
 
-      (c) :ref:`Howto RL-MB-003: MBRL on RobotHTM Environment <Howto MB RL 003>`
+      (c) `Howto RL-MB-002: MBRL on RobotHTM Environment <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/04_howtos_mb/howto_rl_mb_002_robothtm_environment.html>`_
 
 
 **7. Additional Guidance**
    After following the previous steps, we hope that you could practice MLPro-RL and start using this subpackage for your RL-related activities.
    For more advanced features, we highly recommend you to check out the following howto files:
 
-   (a) :ref:`Howto RL-AGENT-011: Train and Reload Single Agent (Gym) <Howto Agent RL 011>`
+   (a) `Howto RL-AGENT-001: Train and Reload Single Agent (Gymnasium) <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/01_howtos_agent/howto_rl_agent_001_train_and_reload_single_agent_gym.html>`_
 
-   (b) :ref:`Howto RL-AGENT-021: Train and Reload Single Agent (MuJoCo) <Howto Agent RL 021>`
+   (b) `Howto RL-HT-001: Hyperparameter Tuning using Hyperopt <https://mlpro-int-hyperopt.readthedocs.io/en/latest/content/01_examples_pool/howto.rl.ht.001.html>`_
 
-   (c) :ref:`Howto RL-HT-001: Hyperopt <Howto RL HT 001>`
+   (c) `Howto RL-HT-001: Hyperparameter Tuning using Optuna <https://mlpro-int-optuna.readthedocs.io/en/latest/content/01_examples_pool/howto.rl.ht.002.html>`_
 
-   (d) :ref:`Howto RL-HT-002: Optuna <Howto RL HT 002>`
-
-   (e) :ref:`Howto RL-ATT-001: Stagnation Detection <Howto RL ATT 001>`
-
-   (f) :ref:`Howto RL-ATT-002: SB3 Policy with Stagnation Detection <Howto RL ATT 002>`
+   (d) `Howto RL-ATT-001: Train and Reload Single Agent using Stagnation Detection (Gymnasium) <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/03_howtos_att/howto_rl_att_001_train_and_reload_single_agent_gym_sd.html>`_
diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/03_env.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/03_env.rst
@@ -33,17 +33,17 @@ There are two main possibilities to set up an environment in MLPro, such as,
    env/customenv
    env/pool
 
-Alternatively, you can also :ref:`reuse available environments from 3rd-party packages via wrapper classes <target-package-third>` (currently available: OpenAI Gym or PettingZoo).
+Alternatively, you can also :ref:`reuse available environments from 3rd-party packages via wrapper classes <target_extension_hub>` (currently available: Gymnasium or PettingZoo).
 
 For reusing the 3rd packages, we develop a wrapper technology to transform the environment from the 3rd-party package to the MLPro-compatible environment.
 Additionally, we also provide the wrapper for the other way around, which is from MLPro Environment to the 3rd-party package.
-At the moment, there are two ready-to-use wrapper classes. The first wrapper class is intended for OpenAI Gym and the second wrapper is intended for PettingZoo.
+At the moment, there are two ready-to-use wrapper classes. The first wrapper class is intended for Gymnasium and the second wrapper is intended for PettingZoo.
 The guide to using the wrapper classes is step-by-step explained in our how-to files, as follows:
 
-(1) :ref:`OpenAI Gym to MLPro <Howto WP RL 004>`,
+(1) `Gymnasium to MLPro <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_wp_002_gymnasium_environment_to_mlpro_environment.html>`_,
 
-(2) :ref:`MLPro to OpenAI Gym <Howto WP RL 001>`,
+(2) `MLPro to Gymnasium <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_wp_001_mlpro_environment_to_gymnasium_environment.html>`_,
 
-(3) :ref:`PettingZoo to MLPro <Howto WP RL 003>`, and
+(3) `PettingZoo to MLPro <https://mlpro-int-pettingzoo.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_wp_002_run_multiagent_with_own_policy_on_petting_zoo_environment.html>`_, and
 
-(4) :ref:`MLPro to PettingZoo <Howto WP RL 002>`.
+(4) `MLPro to PettingZoo <https://mlpro-int-pettingzoo.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_wp_001_mlpro_environment_to_petting_zoo_environment.html>`_.
diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/05_scenario.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/05_scenario.rst
@@ -26,5 +26,5 @@ Moreover, the users can create either a single-agent scenario or a multi-agent s
 **Cross Reference**
 
   - :ref:`MLPro-RL: Training <target_training_RL>`
-  - :ref:`Howto RL-AGENT-001: Run an Agent with Own Policy <Howto Agent RL 001>`
-  - :ref:`Howto RL-AGENT-003: Run Multi-Agent with Own Policy <Howto Agent RL 003>`
+  - `Howto RL-AGENT-001: Run an Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_001_run_agent_with_own_policy_on_gym_environment.html>`_
+  - `Howto RL-AGENT-003: Run Multi-Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_003_run_multiagent_with_own_policy_on_multicartpole_environment.html>`_
diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/06_train.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/06_train.rst
@@ -23,7 +23,7 @@ In this RL training, we always start with a defined random initial state of the
     (3) **Event Timeout**: This means that the maximum training cycles for an episode are reached and the actual episode is ended.
 
 If none of the events is satisfied, then the training continues. The goal of the training is to maximize the score of the repetitive evaluations.
-In this case, a :ref:`stagnation detection functionality <Howto RL ATT 001>` can be incorporated to avoid a long training time without any more improvements.
+In this case, a `stagnation detection functionality <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/03_howtos_att/howto_rl_att_001_train_and_reload_single_agent_gym_sd.html>`_ can be incorporated to avoid a long training time without any more improvements.
 The training can be ended, once the stagnation is detected. For more information, you can read `Section 4.3 of MLPro 1.0 paper <https://doi.org/10.1016/j.mlwa.2022.100341>`_.
 
 In MLPro-RL, we simplify the process of setting up an RL scenario and training for both single-agent and multi-agent RL, as shown below:
@@ -125,12 +125,10 @@ In MLPro-RL, we simplify the process of setting up an RL scenario and training f
 **Cross Reference**
 
     - `A sample application video of MLPro-RL on a UR5 robot <https://ars.els-cdn.com/content/image/1-s2.0-S2665963822001051-mmc2.mp4>`_
-    - :ref:`Howto RL-AGENT-002: Train an Agent with Own Policy <Howto Agent RL 002>`
-    - :ref:`Howto RL-AGENT-004: Train Multi-Agent with Own Policy <Howto Agent RL 004>`
-    - :ref:`Howto RL-AGENT-011: Train and Reload Single Agent (Gym) <Howto Agent RL 011>`
-    - :ref:`Howto RL-AGENT-021: Train and Reload Single Agent (MuJoCo) <Howto Agent RL 021>`
-    - :ref:`Howto RL-ATT-001: Train and Reload Single Agent using Stagnation Detection (Gym) <Howto RL ATT 001>`
-    - :ref:`Howto RL-ATT-002: Train and Reload Single Agent using Stagnation Detection (MuJoCo) <Howto RL ATT 002>`
-    - :ref:`Howto RL-MB-001: Train and Reload Model Based Agent (Gym) <Howto MB RL 001>`
-    - :ref:`Howto RL-MB-002: MBRL with MPC on Grid World Environment <Howto MB RL 002>`
+    - `Howto RL-AGENT-002: Train an Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_002_train_agent_with_own_policy_on_gym_environment.html>`_
+    - `Howto RL-AGENT-004: Train Multi-Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_004_train_multiagent_with_own_policy_on_multicartpole_environment.html>`_
+    - `Howto RL-AGENT-001: Train and Reload Single Agent (Gymnasium) <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/01_howtos_agent/howto_rl_agent_001_train_and_reload_single_agent_gym.html>`_
+    - `Howto RL-ATT-001: Train and Reload Single Agent using Stagnation Detection (Gymnasium) <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/03_howtos_att/howto_rl_att_001_train_and_reload_single_agent_gym_sd.html>`_
+    - `Howto RL-MB-001: Train and Reload Model Based Agent (Gymnasium) <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/04_howtos_mb/howto_rl_mb_001_train_and_reload_model_based_agent_gym%20copy.html>`_
+    - :ref:`Howto RL-MB-001: MBRL with MPC on Grid World Environment <Howto MB RL 001>`
     - :ref:`MLPro-BF-ML: Training and Tuning <target_bf_ml_train_and_tune>`
diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/agents/custompolicies.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/agents/custompolicies.rst
@@ -59,7 +59,7 @@ Custom Policies
 - **Policy from Third Party Packages**
 
     Alternatively, the user can also apply algorithms from Stable Baselines 3 by using the developed relevant wrapper for the integration between third-party packages and MLPro.
-    For more information, please click :ref:`here<target-package-third>`.
+    For more information, please click :ref:`here<target_extension_hub>`.
 
 - **Algorithm Checker**
 

diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/agents/mbagents.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/agents/mbagents.rst
@@ -3,7 +3,7 @@ Model-Based Agents
 ==================
 
 Model-Based Agents have a dissimilar learning target as Model-Free Agents, whereas learning the environment model is not required in the model-free RL.
-An environment model can be incorporated into a single agent, see :ref:`EnvModel <customEnvModel>` for an overview.
+An environment model can be incorporated into a single agent, as **EnvModel**.
 Then, this model learns the behaviour and dynamics of the environment.
 After learning the environment, the model is optimized to be able to accurately predict the output states, rewards, or status of the environment with respect to the calculated actions.
 As a result, if the predictions of the subsequent state and reward diverge too far from the actual values of the environment, the environment model itself is incorporated into the agent's adaptation process and is always retrained.
@@ -112,8 +112,8 @@ the original environment module.
 
 **Cross Reference**
 
-    - :ref:`Howto RL-MB-001: Train and Reload Model Based Agent (Gym) <Howto MB RL 001>`
-    - :ref:`Howto RL-MB-002: MBRL with MPC on Grid World Environment <Howto MB RL 002>`
-    - :ref:`Howto RL-MB-003: MBRL on RobotHTM Environment <Howto MB RL 003>`
+    - `Howto RL-AGENT-001: Train and Reload Single Agent (Gymnasium) <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/01_howtos_agent/howto_rl_agent_001_train_and_reload_single_agent_gym.html>`_
+    - :ref:`Howto RL-MB-001: MBRL with MPC on Grid World Environment <Howto MB RL 001>`
+    - `Howto RL-MB-002: MBRL on RobotHTM Environment <https://mlpro-int-sb3.readthedocs.io/en/latest/content/01_example_pool/04_howtos_mb/howto_rl_mb_002_robothtm_environment.html>`_
     - :ref:`MLPro-SL <target_bf_sl_afct>`
 
diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/agents/multiagents.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/agents/multiagents.rst
@@ -10,9 +10,9 @@ It is compatible with single-agent but does not have its own policy.
 Instead, it is utilized to combine and control any quantity of single agents that together control the action calculation.
 Every single agent in this situation interacts with a separate portion of the surrounding multi-observation agents and action space.
 Multi-agent interactions take place in appropriate contexts that support the scalar reward per agent reward type. 
-These are native applications that incorporate the MLPro environment template or PettingZoo environments that may be incorporated using the corresponding :ref:`wrapper class<target-package-third>` offered by MLPro.
+These are native applications that incorporate the MLPro environment template or PettingZoo environments that may be incorporated using the corresponding :ref:`wrapper class<target_extension_hub>` offered by MLPro.
 
 
 **Cross Reference**
-    - :ref:`Howto RL-AGENT-004: Train Multi-Agent with Own Policy <Howto Agent RL 004>`
+    - `Howto RL-AGENT-004: Train Multi-Agent with Own Policy <https://mlpro-int-gymnasium.readthedocs.io/en/latest/content/01_example_pool/01_howtos_rl/howto_rl_agent_004_train_multiagent_with_own_policy_on_multicartpole_environment.html>`_
     - :ref:`MLPro-RL: Training <target_training_RL>`
diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/agents/pool/mpc.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/agents/pool/mpc.rst
@@ -41,5 +41,5 @@ Depending on the number of planning horizon, but we believe that this reduces th
 
 **Citation**
 
-If you apply this policy in your research or work, please :ref:`cite <target_publications>` us and the `original paper <https://ieeexplore.ieee.org/document/7989202>`_.
+If you apply this policy in your research or work, please :ref:`cite <target_publications>` us and the `original paper <https://ieeexplore.ieee.org/document/10185716>`_.
 
diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/env/customenv.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/env/customenv.rst
@@ -196,7 +196,7 @@ Developing Custom Environments
 
     Alternatively, if your environment follows Gym or PettingZoo interface, you can apply our
     relevant useful wrappers for the integration between third party packages and MLPro. For more
-    information, please click :ref:`here<target-package-third>`.
+    information, please click :ref:`here<target_extension_hub>`.
 
 - **Environment Checker**
 

diff --git a/doc/rtd/content/03_machine_learning/mlpro_rl/sub/env/pool.rst b/doc/rtd/content/03_machine_learning/mlpro_rl/sub/env/pool.rst
@@ -9,4 +9,5 @@ Reusing Environment from the Pool
    pool/multicartpole
    pool/gridworld
    pool/robotmanipulator
-   pool/doublependulum
+   pool/doublependulum
+   pool/2Dcollisiondetection