# Basic Reinforcement Learning Example


## Verify PyTorch Installation
* Reinforcement learning training typically requires GPU acceleration, so using the PyTorch framework can significantly improve training performance.
* The following code verifies the PyTorch installation and checks the output to determine whether a GPU-accelerated version of PyTorch (compatible with your CUDA version) is installed.



In [None]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("CUDA version:", torch.version.cuda)
    print("Current CUDA device:", torch.cuda.current_device())
    print("CUDA device name:", torch.cuda.get_device_name(torch.cuda.current_device()))

In [None]:
import torch
import time

# Check if CUDA is available
if not torch.cuda.is_available():
    raise SystemError("CUDA is not available. Please check your PyTorch installation.")

# Print CUDA device information
device = torch.device("cuda")
print("Using device:", torch.cuda.get_device_name(0))

# Create a large tensor
tensor_size = 20000  # Increase the size of the tensor
x = torch.rand((tensor_size, tensor_size), device=device)
y = torch.rand((tensor_size, tensor_size), device=device)

# Perform multiple matrix multiplication operations and measure the time
num_iterations = 10
start_time = time.time()

for _ in range(num_iterations):
    result = torch.mm(x, y)

elapsed_time = time.time() - start_time

print(f"{num_iterations} iterations of matrix multiplication completed in {elapsed_time:.4f} seconds.")
print(f"Average time per iteration: {elapsed_time / num_iterations:.4f} seconds.")

# Check a part of the result to ensure the operation was successful
print("Result[0, 0]:", result[0, 0].item())


## Validate Training with 16 Agents

1. Open the `Franka_RL` level and click run.
2. Execute the code below.
3. You will see the Franka robot arms start moving quickly, and the console will print out training information. In this task, the agent's goal is to control the robot arm's end effector to reach the transparent red cube marker. This is a relatively simple task that can achieve a high success rate within 200 episodes for each agent. This will take about 20 minutes to train.

In [None]:
import subprocess
import os

script_dir = "../examples/franka_rl/"
script_path = os.path.join("franka_mocap_multi_agents.py")
command = [
    "python", script_path,
    "--orcagym_addresses", "localhost:50051",
    "--subenv_num", "1",
    "--agent_num", "16",
    "--task", "reach",
    "--model_type", "ppo",
    "--run_mode", "training",
    "--training_episode", "200"
]
try:
    subprocess.run(command, cwd=script_dir)
except KeyboardInterrupt:
    print("Training stopped by user.")

4. When the training is complete, run the following command to validate the training results:

In [None]:
import subprocess
import os

script_dir = "../examples/franka_rl/"
script_path = os.path.join("franka_mocap_multi_agents.py")
command = [
    "python", script_path,
    "--orcagym_addresses", "localhost:50051",
    "--subenv_num", "1",
    "--agent_num", "16",    
    "--task", "reach",
    "--model_type", "ppo",
    "--run_mode", "testing",
    "--training_episode", "200"
]

subprocess.run(command, cwd=script_dir)

Next, you can try other tasks, such as **pick_and_place**, or explore different training algorithms like **DDPG**. 

The **pick_and_place** task is significantly more complex than the **reach** task, and you might require more training iterations. To accelerate training, you can use **subenv** to increase concurrency. **Subenv** is a clone of **OrcaGymEnv** that can run without displaying results in **OrcaStudio**. 

For specific parameters and code examples, refer to **examples/franka_rl/franka_mocap_multi_agents.py**. You can also improve the reward function for the **pick_and_place** or **reach** tasks to achieve better training results.
