# Multi-Objective Optimization

Many people may not realize that Multi Objective Problems (MOP) could be general in our life and society. Every day we are making decisions trying to compromise among different objectives. If you were a car buyer, you want to lower price and gas consumption, and higher level of comfort and performance. With limited budget, these objectives are conflict each other. You need to balance between lower price or gas consumption and comfort or performance. On the society level, the central bank’s monetary policy needs to balance among inflation rate, unemployment rate, trade deficit and other economic factors. Lower interest rate may reduce the unemployment but may increase the risk of inflation at same time. We can find similar situations in other areas such as engineering and communication designs. In each situation, the decision maker (DM) wants to optimize more than one objective, which are conflict each other in most of cases.

## Problem formulation

A multi-objective optimization problem is an optimization problem in which several possibly conflicting objectives are being optimized simultaneously. It can be defined as follows:

$$\min_{w \in R^D} L(w) = \min_{w \in R^D} \bigg| L_1(w), L_2(w),\dots,L_n(w) \bigg|^{}_{n \times 1}
$$

where $n$ is the number of objectives to optimize, $w$ the model parameters, $D$ is the total number of parameters, $L_i:R^D \rightarrow R,\ i=1,\dots,n$, $L_i$ is a single objective loss function, and $L$ is the multi-objective loss function. Operator $\min$ represents the operation of minimization of all objectives simultaneously. This is not a limiting factor, because, without loss of generality, any maximization problem can be transformed into a minimization problem.

In case that a gradient-based optimization algorithm is applied, the gradient of every constituent loss function $\nabla_wL_i(w)$ has to be a Lipschitz continuous function (Murphy, 2013).

## Scenario: Optimal fruit choice

Let's understand with an example. Imagine you want to eat a fruit and you have conditions - the fruit should be healthy as well as tasty. Based on your knowledge, you draw the following diagram to decide which fruit to eat:

<p><center><figure><img src='_images/L069335_1.png'><figcaption>[source](https://www.alexirpan.com/public/fruit-opinion/xkcdfruit.png)</figcaption></figure></center></p>

So which one is the best to eat? The answer is that there is no single optimal choice here but three - Peaches, Strawberries, and Seedless grapes. These 3 fruits are at the Pareto Front (don't worry, we are going to learn this concept) and called non-dominated and therefore optimal choices.

## Scenario: Cement factory

Let's say we run a cement factory and looking to optimize our profits. But as per regulations and ethics, we also need to take care of health impacts to nearby communities. So let's understand how we can formulate this situation as MOO. Objective 1 could be to maximize production to maximize profit. Objective 2 could be to minimize hazardous wastes to protect health of nearby communities. How to solve the contradiction between objectives? - we can do that by compromising. Given these objectives, one way to formulate our goal is ***Max(Profit) s.t. Produced hazards ≤ Max allowable value.*** We can alternatively formulate our goal as ***Max(Profit/Hazards)***.

## Scenario: Accuracy vs speed tradeoff

We can frame the multi-objective optimization problem as a search for optimal tradeoffs. Let’s imagine that we really care about exactly two objectives: predictive accuracy, and the speed at which we can make a prediction.

Unfortunately, these things are likely to be in tension. It may be possible to construct a very accurate classifier by using extremely large models, or stacking several ML algorithms, or by performing many complex feature transformations. All of these things increase the computation necessary to make a prediction, and thus slow us down.

Imagine we randomly sampled hyperparameter configurations and measured the speed and accuracy of the resulting models. We would surely find some configurations that result in algorithms being both slower and less accurate than others. Speaking technically, if one point—call it `A`—is better than another point—`B`—in one dimension, and at least as good in all other dimensions, we say `A` *dominates* `B`. We’d never want to deploy dominated models, since there are other models that are strictly better in both the optimization objectives.

<p><center><figure><img src='_images/L069335_2.png'><figcaption>As we try different hyperparameter configurations, we’ll find that the resultant models represent different accuracy-speed trade offs. We can visualize each model as a point in the trade off space.</figcaption></figure></center></p>

It’s possible we’d find one point that maximizes both the accuracy and speed of our predictions. In practice, this is unlikely. We might improve accuracy by using deeper trees in a random forest, but deeper trees also take longer to evaluate, so we have traded off some speed for accuracy.

Eventually, we’ll discern an edge in the accuracy-speed tradeoff space, where we cannot find a hyperparameter combination that leads to an improvement in one direction without a negative impact on the other. This edge is called the *Pareto frontier*, and allows us to make a quantitative tradeoff between our optimization objectives. The Pareto frontier is constructed from the set of non-dominated points, and choosing any one of them gives us our exact accuracy/speed tradeoff.

<p><center><figure><img src='_images/L069335_3.png'><figcaption>As we try different hyperparameter configurations, we’ll find that the resultant models represent different accuracy-speed trade offs. We can visualize each model as a point in the trade off space.</figcaption></figure></center></p>

Ultimately, a deployed ML system will be trained with a single hyperparameter combination, and we must choose a single *point* in the accuracy-speed plane. The Pareto frontier allows us to present a decision maker with a host of models, some maximizing accuracy, others maximizing speed, and the entire spectrum in between.

How do we find this frontier? We could construct it with a dense random sampling of the hyperparameter search space. This risks being inefficient. We’d like to spend as little time as possible sampling configurations that aren’t likely to expand the Pareto frontier. Every sample on the frontier is useful, because they let us trade off accuracy and speed in a new combination. Samples inside the frontier end up being useless.

```{tableofcontents}
```

## References

1. Multi-Objective Recommendations: A Tutorial. Yong Zheng, David Wang. 2021. RecSys. [https://arxiv.org/abs/2108.06367](https://arxiv.org/abs/2108.06367)
2. Progressive Layered Extraction. H. Tang, J. Liu, M. Zhao, X. Gong. 2020. RecSys. [https://dl.acm.org/doi/abs/10.1145/3383313.3412236](https://dl.acm.org/doi/abs/10.1145/3383313.3412236)
3. A Pareto-Efficient Algorithm for Multiple Objective Optimization in E-Commerce Recommendation. Lin et. al.. 2019. RecSys. [http://www.yongfeng.me/attach/lin-recsys2019.pdf](http://www.yongfeng.me/attach/lin-recsys2019.pdf)
4. Multiple Objective Optimization in Recommender Systems. Mario Rodriguez , Christian Posse , Ethan Zhang Authors Info & Claims. 2012. RecSys. [https://dl.acm.org/doi/10.1145/2365952.2365961](https://dl.acm.org/doi/10.1145/2365952.2365961)
5. Advances in Recommender Systems: From Multi-stakeholder Marketplaces to Automated RecSys. Rishabh Mehrotra , Ben Carterette , Yong Li , Quanming Yao , Chen Gao , James Kwok , Qiang Yang , Isabelle Guyon. 2020. KDD. [https://dl.acm.org/doi/abs/10.1145/3394486.3406463](https://dl.acm.org/doi/abs/10.1145/3394486.3406463)
6. MMoE: Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. Jiaqi Ma , Zhe Zhao , Xinyang Yi , Jilin Chen , Lichan Hong , Ed H. Chi. 2018. KDD. [https://dl.acm.org/doi/10.1145/3219819.3220007](https://dl.acm.org/doi/10.1145/3219819.3220007)
7. An Overview of Multi-Task Learning in Deep Neural Networks. Sebastian Ruder. 2017. arXiv. [https://arxiv.org/abs/1706.05098](https://arxiv.org/abs/1706.05098)
8. Recommending What Video to Watch Next: A Multitask Ranking System. Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, Ed Chi. 2019. RecSys. [https://daiwk.github.io/assets/youtube-multitask.pdf](https://daiwk.github.io/assets/youtube-multitask.pdf)
9. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. Hongyan Tang , Junning Liu , Ming Zhao , Xudong Gong. 2020. RecSys. [https://dl.acm.org/doi/10.1145/3383313.3412236](https://dl.acm.org/doi/10.1145/3383313.3412236)
10. Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation. Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo. 2019. arXiv. [https://arxiv.org/abs/1901.08907](https://arxiv.org/abs/1901.08907)
11. Multi-Gradient Descent for Multi-Objective Recommender Systems. Nikola Milojkovic, Diego Antognini, Giancarlo Bergamin, Boi Faltings, Claudiu Musat. 2020. arXiv. [https://arxiv.org/abs/2001.00846](https://arxiv.org/abs/2001.00846)
12. Multi Objective Pareto Efficient Approaches for Recommender Systems. Marco Tulio Ribeiro , Nivio Ziviani , Edleno Silva De Moura , Itamar Hata , Anisio Lacerda , Adriano Veloso. 2015. arXiv. [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.721.3102&rep=rep1&type=pdf](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.721.3102&rep=rep1&type=pdf)
13. Multi-FR: A Multi-Objective Optimization Method for Achieving Two-sided Fairness in E-commerce Recommendation. Haolun Wu Chen Ma Bhaskar Mitra Fernando Diaz Xue Liu. 2021. arXiv. [https://arxiv.org/abs/2105.02951](https://arxiv.org/abs/2105.02951)
14. Pareto Optimisation: Multi-Task Learning with User Preferences.  2020.  [https://youtu.be/mgxrjGw6WKU](https://youtu.be/mgxrjGw6WKU)
15. Fairness-Aware Group Recommendation with Pareto-Efficiency. Lin Xiao , Zhang Min , Zhang Yongfeng , Gu Zhaoquan , Liu Yiqun , Ma Shaoping. 2017. RecSys. [https://dl.acm.org/doi/10.1145/3109859.3109887](https://dl.acm.org/doi/10.1145/3109859.3109887)
16. Multi-Task Learning as Multi-Objective Optimization. Ozan Sener, Vladlen Koltun. 2018. arXiv. [https://arxiv.org/abs/1810.04650](https://arxiv.org/abs/1810.04650)
17. Efficient Continuous Pareto Exploration in Multi-Task Learning. Pingchuan Ma, Tao Du, Wojciech Matusik. 2020. arXiv. [https://arxiv.org/abs/2006.16434](https://arxiv.org/abs/2006.16434)
18. ESSM: Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, Kun Gai. 2018. SIGIR. [https://arxiv.org/abs/1804.07931](https://arxiv.org/abs/1804.07931)
19. Multitask Learning. Rich Caruana. 1997.  [http://reports-archive.adm.cs.cmu.edu/anon/1997/CMU-CS-97-203.pdf](http://reports-archive.adm.cs.cmu.edu/anon/1997/CMU-CS-97-203.pdf)
20. Multi-objective Ranking via Constrained Optimization. Michinari Momma, Alireza Bagheri Garakani, Nanxun Ma, Yi Sun. 2020. arXiv. [https://arxiv.org/abs/2002.05753](https://arxiv.org/abs/2002.05753)
21. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean. 2017. arXiv. [https://arxiv.org/abs/1701.06538](https://arxiv.org/abs/1701.06538)
22. [https://d-nb.info/1231781351/34](https://d-nb.info/1231781351/34)
23. [https://github.com/geangohn/recsys-tutorial](https://github.com/geangohn/recsys-tutorial)
24. [https://github.com/imsheridan/DeepRec/tree/master/MTL](https://github.com/imsheridan/DeepRec/tree/master/MTL)
25. [DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning](https://arxiv.org/pdf/2106.03760v2.pdf) `paper`
26. [CSC321: Introduction to Neural Networks and Machine Learning](https://www.cs.toronto.edu/~hinton/csc321/notes/lec15.pdf) `ppt`
27. [Video Recommendation with Multi-gate Mixture of Experts Soft Actor Critic](https://dl.acm.org/doi/10.1145/3397271.3401238) `paper`
28. [Personalized Educational Learning with Multi-Stakeholder Optimizations](https://www.researchgate.net/publication/332552766_Personalized_Educational_Learning_with_Multi-Stakeholder_Optimizations) `paper`
29. [https://esa.github.io/pygmo2/index.html](https://esa.github.io/pygmo2/index.html) `tool`
30. [https://pymoo.org/](https://pymoo.org/) `tool`
31. [https://pythonhosted.org/inspyred/](https://pythonhosted.org/inspyred/) `tool`
32. [https://platypus.readthedocs.io/en/latest/](https://platypus.readthedocs.io/en/latest/) `tool`
33. [PyTorch Community Voices | Multi-task Reinforcement Learning | Shagun Sodhani](https://youtu.be/QIX9b9EAZOY) `video` `st` `mtrl`
34. [https://github.com/facebookresearch/mtrl](https://github.com/facebookresearch/mtrl) `code` `st` `mtrl`
35. [https://github.com/facebookresearch/mtenv](https://github.com/facebookresearch/mtenv) `code` `st` `mtrl`
36. [ESMM]Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate [SIGIR2018][[PDF]](https://arxiv.org/pdf/1804.07931.pdf)
37. [ESM2]Conversion Rate Prediction via Post-Click Behaviour Modeling