Commit ee77ae5
committed
fix(policy_value_model): value head was not learning from hidden state
- Maybe it was a dumb idea to make the value head a single-layer projection of the hidden state.
- Reduce the output sequence and use a dense layer activated with ReLu.1 parent 460f80c commit ee77ae5
1 file changed
+9
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
67 | 72 | | |
68 | 73 | | |
69 | 74 | | |
| |||
114 | 119 | | |
115 | 120 | | |
116 | 121 | | |
117 | | - | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
118 | 125 | | |
119 | 126 | | |
120 | 127 | | |
| |||
0 commit comments