Commit 02b11ee
committed
fix(training): use n-step windows during a3c training
- the current model has a bug where it uses the entire history during training. This means that predictions progressively get slower with longer episode lengths.
- clipping to 3-step by default because that seems to be popular, and the a3c agent seems capable of using it to quickly solve easy poly problems1 parent 92695d6 commit 02b11ee
File tree
2 files changed
+3
-4
lines changed- libraries/mathy_python/mathy/agents
2 files changed
+3
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | | - | |
| 61 | + | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
8 | 7 | | |
9 | 8 | | |
10 | 9 | | |
| |||
275 | 274 | | |
276 | 275 | | |
277 | 276 | | |
278 | | - | |
| 277 | + | |
279 | 278 | | |
280 | 279 | | |
281 | 280 | | |
| |||
0 commit comments