Skip to content

v0.3.0

Choose a tag to compare

@matee8 matee8 released this 19 Nov 00:29
· 10 commits to main since this release
c065391

Added

  • Full Reinforcement Learning pipeline for training custom backtracking policies (backtracking-llm-train-rl).
  • RlPolicyOperator for executing trained PPO policies during inference.
  • Gymnasium environment with LLM-as-a-Judge scoring and intermediate reward shaping.
  • GenerationSession class for fine-grained, stateful control over the generation loop.

Fixed

  • Fixed critical state leakage in CLI chat sessions where operators retained history between turns.
  • Fixed "amnesia" bug in NGramOverlap where backtracking erased too much history.
  • Fixed greedy decoding to be truly deterministic when temperature is set to 0.