Why the result is not better than MPC? #11

fangxiaoran · 2017-11-03T02:36:18Z

Hi Hongzi,

I tried to reproduce the result of Pensieve. After several attempts, I failed to get an ideal result (better performance than MPC). The following is the way I used. The code was downloaded from GitHub, and the trace files were got from Dropbox:

Put training data (train_sim_traces) in sim/cooked_traces and testing data (test_sim_traces) in sim/cooked_test_traces;
Run python multi_agent.py to train the model;
Copy the generated model files to test/model, and modify the model name in test/rl_no_training.py;
Run python rl_no_training.py in test/ folder to test the model, trace files in test_sim_traces are also used;
Run python plot_results.py to compare the results with DP method & MPC method.

I put two figures of total_reward and CDF here. We can see the performance of Pensieve is not better than MPC.

Here is a figure of tensorboard. The training step is about 160,000.

I found the result is not very stable after long time training (more than 10,000). Thus the trained models bring different performance when testing. For example, the model of 164500 steps got a reward of 35.2, while the model of 164600 steps got a reward of 33.7.

Did I do something wrong, so that I couldn't get the same result as you described in the paper? The pretrain_linear_reward model performs good. How do you get it? Can you give me a hand to solve these questions, any answer is highly appreciated.

Thanks!

The text was updated successfully, but these errors were encountered:

hongzimao · 2017-11-03T02:47:08Z

What you did was all correct. But as what we stated in the paper, as well as the README.md in /sim:

As reported by the A3C paper (http://proceedings.mlr.press/v48/mniha16.pdf) and a faithful implementation (https://openreview.net/pdf?id=Hk3mPK5gg), we also found the exploration factor in the actor network quite crucial for achieving good performance. A general strategy to train our system is to first set ENTROPY_WEIGHT in a3c.py to be a large value (in the scale of 1 to 5) in the beginning, then gradually reduce the value to 0.1 (after at least 100,000 iterations).

You can easily achieve an automatic exploration decay for ENTROPY_WEIGHT. The reason we didn’t explicitly do this is to have others see the effect of this parameter, as you just discovered :).

Hope this helps.

fangxiaoran · 2017-11-06T10:52:07Z

Thanks for your answer!

I've tried a decreased ENTROPY_WEIGHT. At first, I set it as the following strategy:
0-19999 iteration ENTROPY_WEIGHT = 5
20000-39999 iteration ENTROPY_WEIGHT = 4
40000-59999 iteration ENTROPY_WEIGHT = 3
60000-69999 iteration ENTROPY_WEIGHT = 2
70000-79999 iteration ENTROPY_WEIGHT = 1
80000-89999 iteration ENTROPY_WEIGHT = 0.5
90000-100000 iteration ENTROPY_WEIGHT = 0.01
But I got a negative reward as below.

Then I changed the initial value of ENTROPY_WEIGHT to 1 (still decrease to 0.01 after 100,000 epochs). This time I got a reward of about 20. It's better than the first strategy while still worse than I expected.

These make me thought the result is related to a proper strategy of decreased ENTROPY_WEIGHT. Is that true? How did you achieve the decay?

hongzimao · 2017-11-06T15:12:09Z

Did you load the trained model of previous run when you decay the factor? We (as well as others who reproduced it; some posts on issues already) didn't do anything fancy, just plain decay once or twice should work.

fangxiaoran · 2017-11-09T09:07:21Z

I figured out what's the problem. As you said, I should stop the program, load the previous trained model, then re-run the python script. I've got good result by this way. But at first, I just set ENTROPY_WEIGHT as a member variable of Class actor, and changed its value during the while loop. This method didn't work well.

Why the "re-run" works differently with my method? Both methods keep the previous trained model, while "re-run" resets the optimizer. Is that the reason?

hongzimao · 2017-11-09T14:06:28Z

I'm glad you got the good performance 👍

As for automatically decaying the exploration factor, notice that ENTROPY_WEIGHT sets a constant in tensorflow computation graph (e.g., https://github.com/hongzimao/pensieve/blob/master/sim/a3c.py#L47-L52). To make it tunable during execution, you need to specify a tensorflow placeholder and set its value each time.

I think any reasonable decay function should work (e.g., linear, step function, etc.). If you manage to get that work, could you post your result (maybe open another issue)? Although we have our internal implementation (we didn't post it because (1) it's fairly easy to implement and (2) more importantly we intentionally want others to observe this effect), we would appreciate a lot if someone can reproduce and improve our implementation. Thanks!

fangxiaoran · 2017-11-10T03:09:42Z

Sure. I'll try to use placeholder and post my result if it works. The following is the current result.

ENTROPY_WEIGHT = 5, 1~20000 epochs
ENTROPY_WEIGHT = 1, 20001~40000 epochs
ENTROPY_WEIGHT = 0.5, 40001~80000 epochs
ENTROPY_WEIGHT = 0.3, 80001~100000 epochs
ENTROPY_WEIGHT = 0.1, 100001~120000 epochs

Wannabeperfect · 2019-12-29T04:42:10Z

I wanna know why the result of CDF is not smaller than 100? Is this correct?

hongzimao closed this as completed Nov 3, 2017

hongzimao reopened this Nov 9, 2017

fangxiaoran closed this as completed Nov 10, 2017

hongzimao mentioned this issue Dec 3, 2017

How did you control the entropy factor $\beta$ from 1 to 0.1? #16

Closed

zhanggh900921 mentioned this issue Feb 9, 2018

QoE performance #30

Closed

Suliucuc mentioned this issue May 26, 2019

question about the convergence problem #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the result is not better than MPC? #11

Why the result is not better than MPC? #11

fangxiaoran commented Nov 3, 2017

hongzimao commented Nov 3, 2017 •

edited

Loading

fangxiaoran commented Nov 6, 2017

hongzimao commented Nov 6, 2017 •

edited

Loading

fangxiaoran commented Nov 9, 2017

hongzimao commented Nov 9, 2017 •

edited

Loading

fangxiaoran commented Nov 10, 2017

Wannabeperfect commented Dec 29, 2019

Why the result is not better than MPC? #11

Why the result is not better than MPC? #11

Comments

fangxiaoran commented Nov 3, 2017

hongzimao commented Nov 3, 2017 • edited Loading

fangxiaoran commented Nov 6, 2017

hongzimao commented Nov 6, 2017 • edited Loading

fangxiaoran commented Nov 9, 2017

hongzimao commented Nov 9, 2017 • edited Loading

fangxiaoran commented Nov 10, 2017

Wannabeperfect commented Dec 29, 2019

hongzimao commented Nov 3, 2017 •

edited

Loading

hongzimao commented Nov 6, 2017 •

edited

Loading

hongzimao commented Nov 9, 2017 •

edited

Loading