Training hangs after step 12 due to race condition with max_policy_age: 0

### 🐛 Describe the bug

My training job sometime hangs after some steps, using claude code to debug and found out this, please help:

### Summary
Training stops and hangs after step 12 when max_policy_age is set to 0 (via off_by_n: 0). The process crashes silently with no error messages, while the Julia docker container continues running and processing requests.
### Root Cause
A race condition between weight updates and buffer sampling when max_policy_age: 0:
Configuration:
```yaml
# apps/openenv/llama3_8b_julia.yaml:11,116
off_by_n: 0
max_policy_age: ${off_by_n}  # This is 0!
```
### The Deadly Sequence:
1. Training completes step 12 → increments training_step to 13
2. Buffer sampling calls _evict(curr_policy_version=13)
3. Eviction logic (replay_buffer.py:36) removes ALL episodes where policy_version < 13
4. Generator is still on version 12 (weight updates take 17-30 seconds)
5. New rollouts create episodes with generator_version=12
6. These episodes get evicted immediately on next sample call

**Result**: Infinite wait for version 13 episodes that never arrive → buffer becomes empty → sample() returns None → training deadlock
### Evidence from Logs
Timeline:
21:19:45 - RLTrainer pushes weights for version 12
21:19:57 - Push completes (12s)
21:20:15 - Generator FINISHES updating to v12 (30s total!)
### Metrics:
buffer/sample/avg_data_utilization: 1.0 (only 8 episodes in buffer - the minimum needed)
buffer/evict/sum_episodes_evicted: 8.0 (evicting 8 episodes per step)
Last logged activity: 21:20:15, then silence
Training completed only 12 out of 1500 steps

### Versions

using my [branch](https://github.com/wukaixingxp/forge/tree/openenv/apps/openenv)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training hangs after step 12 due to race condition with max_policy_age: 0 #557

🐛 Describe the bug

Summary

Root Cause

The Deadly Sequence:

Evidence from Logs

Metrics:

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training hangs after step 12 due to race condition with max_policy_age: 0 #557

Description

🐛 Describe the bug

Summary

Root Cause

The Deadly Sequence:

Evidence from Logs

Metrics:

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions