Skip to content

v1.2.3

@rgfaber rgfaber tagged this 17 Feb 10:25
agent_bridge:evaluate_network/2 always used stateless evaluate/2, even
for CfC/LTC networks. This meant temporal dynamics were never trained —
networks learned stateless policies but were deployed with stateful
evaluate_with_state/2, causing behavior divergence in duels.

Now detects CfC via get_neuron_meta/1 and uses evaluate_with_state/2.
episode_loop threads the updated network through each iteration so
temporal state accumulates across ticks, matching deployment exactly.

BREAKING: sense_think_act/4 returns 4-tuple {I, O, A, UpdatedNetwork}
(was 3-tuple {I, O, A}).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Assets 2
Loading