Minimal Python client to connect to RealFlight Link (TCP 18083), exchange RC commands, and expose a Gymnasium-compatible hover environment.
pip install -e .
cp .env.example .env
python -m hoverpilot.mainThe project now exposes a Gymnasium-style environment:
import numpy as np
from hoverpilot.config import HOST, PORT
from hoverpilot.envs import HoverPilotHoverEnv
env = HoverPilotHoverEnv(
host=HOST,
port=PORT,
max_episode_steps=250,
)
observation, info = env.reset()
action = np.asarray([0.0, 0.0, 0.55, 0.0], dtype=np.float32)
observation, reward, terminated, truncated, info = env.step(action)The API mirrors Gymnasium:
reset(...) -> (observation, info)step(action) -> (observation, reward, terminated, truncated, info)
Action format is a 4-element float32 array:
- index 0:
aileronin[-1, 1] - index 1:
elevatorin[-1, 1] - index 2:
throttlein[0, 1] - index 3:
rudderin[-1, 1]
Observation format is a compact 12-element float32 vector for hover training:
- position:
x,y,altitude_agl - attitude:
roll,inclination,azimuth - world velocity:
u,v,w - angular rates:
pitch_rate,roll_rate,yaw_rate
Reward and termination are integrated from hoverpilot.training.hover:
- reward prefers staying near the target hover point and upright attitude
- boundary proximity adds a growing penalty before failure
- terminal failures include trainer boundary exit, altitude bounds, lost components, locked vehicle states, configured controller inactivity, configured engine stop, and post-start ground contact
reset() waits for a usable start state before returning. During this warmup the environment keeps sending a safe idle action and polls RealFlight until readiness is satisfied or a timeout is reached.
Run:
python -m hoverpilot.mainThe demo prints:
- observation shape
- scalar reward
terminated/truncated- termination reason when present
- current AGL altitude from
info["debug_state"] - a concise RealFlight state summary
The demo keeps running across episodes and only stops on KeyboardInterrupt. During reset wait periods it rate-limits the waiting for trainer reset log to avoid flooding the terminal.
Airplane Hover Trainer does not always expose a clean explicit reset flag through RealFlight Link, so the environment manages episode lifecycle conservatively.
Episode start:
reset()and reset-wait polling both use a safe idle action.- A state is considered ready when it is not obviously uninitialized, not locked, and not already failed.
- Controller-active and engine-running checks are available as configurable readiness gates because these fields can behave differently across trainer modes.
- Ground contact is allowed during startup by default because some trainer resets spawn on or very near the ground.
Episode end:
- Hard terminal failures include:
m_hasLostComponents > threshold- boundary exit in
xory - altitude too low / too high
- locked vehicle state
- configured controller inactive / engine stopped conditions
- post-start ground contact after the configured grace period
m_currentAircraftStatusis currently treated as opaque. It is exposed ininfo["debug_state"], and only becomes terminal if you explicitly configure known terminal status codes.
Reset-wait and restart:
- After termination, the environment keeps polling with a safe idle action until a new episode can be started.
- Restart signals are checked in this order:
- reset button pressed
- physics time rollback
- trainer-driven reposition / teleport into a reset-like stationary state
- In the current Airplane Hover Trainer setup, the more semantic crash / recovery flags
(
m_hasLostComponents,m_anEngineIsRunning,m_isTouchingGround) often stay fixed at0, so they are still exposed indebug_statebut are not used as primary reset signals. You can verify that behavior with:RFLINK_DEBUG_STATE_FLAGS=1 python -m hoverpilot.main
Useful tuning parameters on HoverPilotHoverEnv:
- readiness / warmup:
max_reset_wait_secondsreset_poll_interval_secondsready_controller_active_thresholdready_running_thresholdready_locked_thresholdallow_ground_contact_at_ready
- teleport fallback:
reposition_speed_threshold_mpsreset_teleport_distance_m
- termination thresholds via
RewardConfig:controller_active_thresholdterminate_on_engine_stoppedground_contact_grace_secondsknown_terminal_aircraft_status_codes
The environment prefers explicit RealFlight Link reset signals first. Teleport / reposition detection is kept as a fallback because the Hover Trainer can reset by suddenly moving the aircraft without updating the more semantic lifecycle flags.
This project is licensed under the MIT License.
See the LICENSE file for details.