Skip to content

ruddyscent/hover-pilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HoverPilot

License

Minimal Python client to connect to RealFlight Link (TCP 18083), exchange RC commands, and expose a Gymnasium-compatible hover environment.

Quickstart

pip install -e .
cp .env.example .env
python -m hoverpilot.main

Gymnasium Environment

The project now exposes a Gymnasium-style environment:

import numpy as np

from hoverpilot.config import HOST, PORT
from hoverpilot.envs import HoverPilotHoverEnv

env = HoverPilotHoverEnv(
    host=HOST,
    port=PORT,
    max_episode_steps=250,
)
observation, info = env.reset()
action = np.asarray([0.0, 0.0, 0.55, 0.0], dtype=np.float32)
observation, reward, terminated, truncated, info = env.step(action)

The API mirrors Gymnasium:

  • reset(...) -> (observation, info)
  • step(action) -> (observation, reward, terminated, truncated, info)

Action format is a 4-element float32 array:

  • index 0: aileron in [-1, 1]
  • index 1: elevator in [-1, 1]
  • index 2: throttle in [0, 1]
  • index 3: rudder in [-1, 1]

Observation format is a compact 12-element float32 vector for hover training:

  • position: x, y, altitude_agl
  • attitude: roll, inclination, azimuth
  • world velocity: u, v, w
  • angular rates: pitch_rate, roll_rate, yaw_rate

Reward and termination are integrated from hoverpilot.training.hover:

  • reward prefers staying near the target hover point and upright attitude
  • boundary proximity adds a growing penalty before failure
  • terminal failures include trainer boundary exit, altitude bounds, lost components, locked vehicle states, configured controller inactivity, configured engine stop, and post-start ground contact

reset() waits for a usable start state before returning. During this warmup the environment keeps sending a safe idle action and polls RealFlight until readiness is satisfied or a timeout is reached.

Demo

Run:

python -m hoverpilot.main

The demo prints:

  • observation shape
  • scalar reward
  • terminated / truncated
  • termination reason when present
  • current AGL altitude from info["debug_state"]
  • a concise RealFlight state summary

The demo keeps running across episodes and only stops on KeyboardInterrupt. During reset wait periods it rate-limits the waiting for trainer reset log to avoid flooding the terminal.

Episode Lifecycle

Airplane Hover Trainer does not always expose a clean explicit reset flag through RealFlight Link, so the environment manages episode lifecycle conservatively.

Episode start:

  • reset() and reset-wait polling both use a safe idle action.
  • A state is considered ready when it is not obviously uninitialized, not locked, and not already failed.
  • Controller-active and engine-running checks are available as configurable readiness gates because these fields can behave differently across trainer modes.
  • Ground contact is allowed during startup by default because some trainer resets spawn on or very near the ground.

Episode end:

  • Hard terminal failures include:
    • m_hasLostComponents > threshold
    • boundary exit in x or y
    • altitude too low / too high
    • locked vehicle state
    • configured controller inactive / engine stopped conditions
    • post-start ground contact after the configured grace period
  • m_currentAircraftStatus is currently treated as opaque. It is exposed in info["debug_state"], and only becomes terminal if you explicitly configure known terminal status codes.

Reset-wait and restart:

  • After termination, the environment keeps polling with a safe idle action until a new episode can be started.
  • Restart signals are checked in this order:
    • reset button pressed
    • physics time rollback
    • trainer-driven reposition / teleport into a reset-like stationary state
  • In the current Airplane Hover Trainer setup, the more semantic crash / recovery flags (m_hasLostComponents, m_anEngineIsRunning, m_isTouchingGround) often stay fixed at 0, so they are still exposed in debug_state but are not used as primary reset signals. You can verify that behavior with: RFLINK_DEBUG_STATE_FLAGS=1 python -m hoverpilot.main

Useful tuning parameters on HoverPilotHoverEnv:

  • readiness / warmup:
    • max_reset_wait_seconds
    • reset_poll_interval_seconds
    • ready_controller_active_threshold
    • ready_running_threshold
    • ready_locked_threshold
    • allow_ground_contact_at_ready
  • teleport fallback:
    • reposition_speed_threshold_mps
    • reset_teleport_distance_m
  • termination thresholds via RewardConfig:
    • controller_active_threshold
    • terminate_on_engine_stopped
    • ground_contact_grace_seconds
    • known_terminal_aircraft_status_codes

The environment prefers explicit RealFlight Link reset signals first. Teleport / reposition detection is kept as a fallback because the Hover Trainer can reset by suddenly moving the aircraft without updating the more semantic lifecycle flags.

License

This project is licensed under the MIT License.
See the LICENSE file for details.

About

Train an RL agent to hover an RC airplane in RealFlight

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages