I further investigate the key reason why RL-based methods fail to perform well in 5G. Here are some key findings.
- the state inputs have not been accurately normalized with the proper scale, while recent work illustrates that state normalization is critical for RL-based tasks [1]. For example, state[2] represents the throughput in Mbps, while most of the bandwidth in 5G is over 100Mbps, even 1Gbps, and eventually, resulting in the unexcepted scale. Some observations are also shown in the video chunk sizes.
Some observations are also shown in the video chunk sizes (MB).
Meanwhile, the reward function requires reward scaling since the maximum bitrate is sized over 160Mbps.
- Another feature is inherited from Pensieve. The download time measured by the RL server means chunk download end - chunk download start. However, in the simulator, the download time is computed as chunk ended - request start. Thus, there exists a gap between those two, i.e., an RTT. Note that such gaps are acceptable for 3G and 4G traces.
Putting them together, we retrained Pensieve w.r.t 5G datasets and videos godka/Pensieve-5G.
However, due to the coronavirus outbreak, we are not allowed to work in the lab. Thus, we employed trace-driven simulation rather than emulation. We first confirmed the correctness of the simulation environment. We compared the results of simulation and emulation via the pre-trained Pensieve model (i.e., nn_model_44200.ckpt). Origin Emulation Results vs. Our Simulation Results are shown below.
Next, we reported our retrained results here. As shown, we can see that Pensieve can also achieve a good performance in 5G networks.
Moreover, we demonstrated Pensieve's last 50,000 epochs, aka., Pensieve's Pareto Frontier. As expected, Pensieve can always satisfy the requirements.
In general, we argue that Pensieve can also be well-performed in 5G networks. The key reason for the results in the paper is the lack of RL tricks, such as state normalization and sim-to-real gaps, rather than Pensieve itself. At the same time, we also provide the pre-trained model in here. Moreover, another team run the model on our emulation setup and get similar results as we posted (plot attached).
Reference [1] Abbasloo S, Yen C Y, Chao H J. Classic meets modern: A pragmatic learning-based congestion control for the Internet[C]//Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 2020: 632-647.