Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QoE performance #30

Closed
zhanggh900921 opened this issue Feb 9, 2018 · 5 comments
Closed

QoE performance #30

zhanggh900921 opened this issue Feb 9, 2018 · 5 comments

Comments

@zhanggh900921
Copy link

zhanggh900921 commented Feb 9, 2018

Dear Hongzi:

I did many experiments based on the pensieve's source code, but I cannot get the equivalent performance as reported in the sigcomm paper (12%-25% outperforms Robust-MPC).

Below is the result:

At first, I used the pre-trained model (i.e. pretrain_linear_reward.ckpt) provided in the source code to do the test with two sets of trace data (i.e. train_sim_traces and test_sim_traces) and the ENTROPY_WEIGHT=0.5:

author model test 1
Fig .1
author model test 2
Fig. 2

We can see that pensieve outperfomed Robust-MPC about 6~7%.

Second. I do the training by myself. I fixed the bug mentioned in #20 and followed the ENTROPY_WEIGHT tuning strategy in #11 . I also selected the model based on a validation set (parts of trace data provided in the source code) to avoid the fluctuation issue in #28 . The QoE function is the linear one:

0 1
Fig.3

We can see that the final testing performance is similar with that in #11 , but much worse than the performance in the sigcomm paper.

Did I do something wrong or anything important I didn't do, so that I couldn't get the same result as you described in the paper?
Can you pls give me a hand to solve these questions, any answer is highly appreciated.

Thanks a lot

@hongzimao
Copy link
Owner

hongzimao commented Feb 9, 2018

Thanks for the effort of reproducing the results!

For your first figure, "...and the ENTROPY_WEIGHT=0.5", when you load a model and do testing, there is no need to set an entropy weight. The entropy only affects the exploration during training. Therefore, you want the entropy to be large in the beginning of the training phase and then decay it to a small value.

As for your optimization beyond our pre-trained model, I'm surprised that it doesn't beat the pre-trained model (sim_rl~43 in the figure). The best performance others achieve I heard is around 47 (which outperforms us, I'm trying to reproduce that too) in linear QoE. Can you let me know the exact steps you used to do the training? How many iterations did you do; how did you change the entropy weight; what set of traces did you use for training, etc.?

About the results we reported, the largest performance gain we observed is with QoE_hd. To reproduce that result, a sanity check you can do after you train a model is figure 3b. You should see the agent almost always alternates between those bitrate levels but not something in between. If your agent achieves that, you should observe similar performance gain as in the paper.

Hope these help.

@zhanggh900921
Copy link
Author

zhanggh900921 commented Feb 10, 2018

Dear Hongzi:

Thanks for your reply.

Training set: train_sim_traces
Validation set: test_sim_traces

0-19999 epoch ENTROPY_WEIGHT = 5
20000-39999 epoch ENTROPY_WEIGHT = 1
40000-79999 epoch ENTROPY_WEIGHT = 0.5
80000-99999 epoch ENTROPY_WEIGHT = 0.3
100000-120000 epoch ENTROPY_WEIGHT = 0.1

Steps for training:

  1. Put training data (train_sim_traces) in sim/cooked_traces and validation data (test_sim_traces) in sim/cooked_test_traces;
  2. Run python multi_agent.py to train the model;
  3. When the value of ENTROPY_WEIGHT need to be changed,
    I stop the program, load the previous trained model with best validation performance (use test_sim_traces for validation ), change the value of ENTROPY_WEIGHT following above strategy, then re-run the python script.

Is there any wrong of my training?
Thanks a lot.

@hongzimao
Copy link
Owner

These settings look reasonable. How do you pick the "best validation performance" (btw, test set shouldn't be validation set; but it is okay for debugging purpose)?

One other caveat is the data in dropbox link is a subset ("sample" data) of what we used for full training. Giving the agent more diverse data helps it learn better and more robust model. To generate more data, you can use the code in trace/. You might want to do this if your goal is to get the best learning model. We provide a minimum dataset mainly for others to quickly reproduce the first order of the results.

Nonetheless, for better understanding and debugging (I'm not clear if there is anything missed during your training) I would try QoE_hd https://github.com/hongzimao/pensieve/blob/master/sim/multi_agent.py#L260-L263 and do the sanity check in figure 3b (mentioned above).

Hope that helps.

@zhanggh900921
Copy link
Author

zhanggh900921 commented Feb 12, 2018

Dear Hongzi:

Thanks for your suggestions, I will try QoE_hd later.

I just fixed a bug for picking the model with the best validation performance.

My current method is:

When the value of ENTROPY_WEIGHT needs to be changed,
I stop the program, load the previous trained model with the highest mean reward (use test_sim_traces for validation ), then .....

The updated performance is:
1518414604 1

which is similar with the pre-trained model provided by you (i.e., 6-7% improvement).

Does this method make sense?
and same with yours (except the selection of validation set)?

Thanks

@hongzimao
Copy link
Owner

This makes sense. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants