Skip to content

Latest commit

 

History

History
6753 lines (5365 loc) · 324 KB

File metadata and controls

6753 lines (5365 loc) · 324 KB

2019年、俺の読んだ論文50本全部解説(俺的ベスト3付き) 綿岡晃輝 https://qiita.com/wataoka/items/ae782defabc3706b5c93

深層学習五十論文 by 綿岡晃輝 単語帳(shell, awk), 英語(25) docker(101) https://qiita.com/kaizen_nagoya/items/670d4d332e07fd2e5fc2

https://hub.docker.com/u/kaizenjapan $ docker run -v /tmp/docker:/tmp/docker -it kaizenjapan/qc-nakamori /bin/bash

1 A Survey on Bias and Fairness in Machine Learning

BMI

Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) Defense Advanced Research Projects Agency (DARPA) Diversity in Faces (DiF) Equal Credit Opportunity Acts (ECOA) Fair Housing Acts(FHA) labeled faces in the wild (LFW) maximum mean discrepancy (MMD) price of fairness (POF) reducing bias amplification (RBA) Sentence Encoder Association Test (SEAT) Science, Technology, Engineering, and Math (STEM) Variational Auto Encoders(VAE) Word Embedding Association Test (WEAT)

REFERENCES

[1] Alekh Agarwal, Miroslav Dudik, and Zhiwei Steven Wu. 2019. Fair Regression: Quantitative Definitions and Reduction- Based Algorithms. In International Conference on Machine Learning. 120–129.

[2] Nazanin Alipourfard, Peter G Fennell, and Kristina Lerman. 2018. Can you Trust the Trend?: Discovering Simpson’s Paradoxes in Social Data. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 19–27.

[3]NazaninAlipourfard,PeterGFennell,andKristinaLerman.2018.UsingSimpsonâA ̆Z ́sParadoxtoDiscoverInteresting Patterns in Behavioral Data. In Twelfth International AAAI Conference on Web and Social Media.

[4] Alexander Amini, Ava Soleimany, Wilko Schwarting, Sangeeta Bhatia, and Daniela Rus. 2019. Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure. (2019).

[5]JuliaAngwin,JeffLarson,SuryaMattu,andLaurenKirchner.2016.MachineBias:thereâA ̆Z ́ssoftwareusedacrossthe countrytopredictfuturecriminals.AnditâA ̆Z ́sbiasedagainstblacks.ProPublica2016.(2016).

[6] A. Asuncion and D.J. Newman. 2007. UCI Machine Learning Repository. (2007). http://www.ics.uci.edu/$\sim$mlearn/

[7] Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. 2019. Scalable Fair Clustering. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 405–413. http://proceedings.mlr.press/v97/backurs19a.html

[8] Ricardo Baeza-Yates. 2018. Bias on the Web. Commun. ACM 61, 6 (May 2018), 54–61. https://doi.org/10.1145/3209581

[9] Samuel Barbosa, Dan Cosley, Amit Sharma, and Roberto M. Cesar-Jr. 2016. Averaging Gone Wrong: Using Time-Aware Analyses to Better Understand Behavior. (April 2016), 829–841.

[10] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. 2018. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint https://arxiv.org/abs/1810.01943 (2018).

[11] Emily M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics 6 (2018), 587–604. https://doi.org/10.1162/tacl_a_00041

[12] Misha Benjamin, Paul Gagnon, Negar Rostamzadeh, Chris Pal, Yoshua Bengio, and Alex Shee. [n. d.]. TOWARDS STANDARDIZATION OF DATA LICENSES: THE MONTREAL DATA LICENSE. ([n. d.]).

[13] Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. 2017. A Convex Framework for Fair Regression. (2017). arXiv: https://arxiv.org/abs/cs.LG/1706.02409

[14] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. [n. d.]. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research ([n. d.]), 0049124118782533.

[15] Peter J Bickel, Eugene A Hammel, and J William O’Connell. 1975. Sex bias in graduate admissions: Data from Berkeley. Science 187, 4175 (1975), 398–404.

[16] RDP Binns. 2018. Fairness in machine learning: Lessons from political philosophy. Journal of Machine Learning Research (2018).

[17] Colin R Blyth. 1972. On Simpson’s paradox and the sure-thing principle. J. Amer. Statist. Assoc. 67, 338 (1972), 364–366.

[18] Miranda Bogen and Aaron Rieke. 2018. Help wanted: an examination of hiring algorithms, equity. Technical Report. and bias. Technical report, Upturn.

[19] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems. 4349–4357.

[20] Shikha Bordia and Samuel Bowman. 2019. Identifying and Reducing Gender Bias in Word-Level Language Models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 7–15.

[21] Avishek Bose and William Hamilton. 2019. Compositional Fairness Constraints for Graph Embeddings. In International Conference on Machine Learning. 715–724.

[22] Marc-Etienne Brunet, Colleen Alkalay-Houlihan, Ashton Anderson, and Richard Zemel. 2019. Understanding the Origins of Bias in Word Embeddings. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 803–811. http://proceedings.mlr.press/v97/brunet19a.html

[23] Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (Proceedings of Machine Learning Research), Sorelle A. Friedler and Christo Wilson (Eds.), Vol. 81. PMLR, New York, NY, USA, 77–91. http://proceedings.mlr.press/v81/buolamwini18a.html

[24] Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery 21, 2 (2010), 277–292.

[25] Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186.

[26] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. 2017. Optimized Pre-Processing for Discrimination Prevention. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3992–4001. http://papers.nips.cc/paper/6988-optimized-pre-processing-for-discrimination-prevention.pdf

[27] Manel Capdevila, Marta Ferrer, and Eulália Luque. 2005. La reincidencia en el delito en la justicia de menores. Centro de estudios jurídicos y formación especializada, Generalitat de Catalunya. Documento no publicado (2005).

[28] Allison JB Chaney, Brandon M Stewart, and Barbara E Engelhardt. 2018. How algorithmic confounding in recommenda- tion systems increases homogeneity and decreases utility. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 224–232.

[29] Jiahao Chen, Nathan Kallus, Xiaojie Mao, Geoffry Svacha, and Madeleine Udell. 2019. Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 339–348.

[30] Xingyu Chen, Brandon Fain, Liang Lyu, and Kamesh Munagala. 2019. Proportionally Fair Clustering. In International Conference on Machine Learning. 1032–1041.

[31] Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153–163.

[32] Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (Proceedings of Machine Learning Research), Sorelle A. Friedler and Christo Wilson (Eds.), Vol. 81. PMLR, New York, NY, USA, 134–148. http://proceedings.mlr.press/v81/chouldechova18a.html

[33] Alexandra Chouldechova and Aaron Roth. 2018. The frontiers of fairness in machine learning. arXiv preprint arXiv: https://arxiv.org/abs/1810.08810 (2018).

[34] John S. Chuang, Olivier Rivoire, and Stanislas Leibler. 2009. Simpson’s Paradox in a Synthetic Microbial System. Science 323, 5911 (2009), 272–275. https://doi.org/10.1126/science.1166739 arXiv:https://science.sciencemag.org/content/323/5911/272.full.pdf

[35] Lee Cohen, Zachary C. Lipton, and Yishay Mansour. 2019. Efficient candidate screening under multiple tests and implications for fairness. (2019). arXiv: https://arxiv.org/abs/cs.LG/1905.11361

[36] United States. Equal Employment Opportunity Commission. [n. d.]. EEOC compliance manual. [Washington, D.C.] : U.S. Equal Employment Opportunity Commission, [1992].

[37] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 797–806.

[38] Elliot Creager, David Madras, Joern-Henrik Jacobsen, Marissa Weis, Kevin Swersky, Toniann Pitassi, and Richard Zemel. 2019. Flexibly Fair Representation Learning by Disentanglement. In International Conference on Machine Learning. 1436–1445.

[39] Brian d’Alessandro, Cathy O’Neil, and Tom LaGatta. 2017. Conscientious classification: A data scientist’s guide to discrimination-aware classification. Big data 5, 2 (2017), 120–134.

[40] Shai Danziger, Jonathan Levav, and Liora Avnaim-Pesso. 2011. Extraneous factors in judicial decisions. Proceedings of the National Academy of Sciences 108, 17 (2011), 6889–6892.

[41] Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science Advances 4, 1 (2018). https://doi.org/10.1126/sciadv.aao5580 arXiv:https://advances.sciencemag.org/content/4/1/eaao5580.full.pdf

[42] Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. (2017). http://archive.ics.uci.edu/ml

[43] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS ’12). ACM, New York, NY, USA, 214–226. https://doi.org/10.1145/2090236.2090255

[44] Golnoosh Farnadi, Behrouz Babaki, and Lise Getoor. 2018. Fairness in Relational Domains. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’18). ACM, New York, NY, USA, 108–114. https: //doi.org/10.1145/3278721.3278733

[45] Joel Escudé Font and Marta R Costa-jussà. 2019. Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques. arXiv preprint arXiv:1901.03116 (2019).

[46] Batya Friedman and Helen Nissenbaum. 1996. Bias in Computer Systems. ACM Trans. Inf. Syst. 14, 3 (July 1996), 330–347. https://doi.org/10.1145/230538.230561

[47] Anna Fry, Thomas J Littlejohns, Cathie Sudlow, Nicola Doherty, Ligia Adamska, Tim Sprosen, Rory Collins, and Naomi E Allen. 2017. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. American Journal of Epidemiology 186, 9 (06 2017), 1026–1034. https: //doi.org/10.1093/aje/kwx246 arXiv:http://oup.prod.sis.lan/aje/article-pdf/186/9/1026/24330720/kwx246.pdf

[48] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. [n. d.]. Datasheets for Datasets. ([n. d.]).

[49] Naman Goel, Mohammad Yaghini, and Boi Faltings. 2018. Non-discriminatory machine learning through convex fairness criteria. In Thirty-Second AAAI Conference on Artificial Intelligence.

[50] Hila Gonen and Yoav Goldberg. 2019. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. arXiv preprint arXiv: https://arxiv.org/abs/1903.03862 (2019).

[51] Sandra González-Bailón, Ning Wang, Alejandro Rivero, Javier Borge-Holthoefer, and Yamir Moreno. 2014. Assessing the bias in samples of large online networks. Social Networks 38 (2014), 16–27.

[52] Susan T Gooden. 2015. Race and social equity: A nervous area of government. Routledge.

[53] Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. 2016. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law, Vol. 1. 2.

[54] S. Hajian and J. Domingo-Ferrer. 2013. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining. IEEE Transactions on Knowledge and Data Engineering 25, 7 (July 2013), 1445–1459. https://doi.org/10.1109/TKDE.2012.72

[55] Moritz Hardt, Eric Price, Nati Srebro, et al. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323.

[56] Eszter Hargittai. 2007. Whose Space? Differences among Users and Non-Users of Social Network Sites. Journal of Computer-Mediated Communication 13, 1 (10 2007), 276–297. https://doi.org/10.1111/j.1083-6101.2007.00396.x arXiv:http://oup.prod.sis.lan/jcmc/article-pdf/13/1/276/22317170/jjcmcom0276.pdf

[57] Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv: https://arxiv.org/abs/1805.03677 (2018).

[58] Ayanna Howard and Jason Borenstein. 2018. The ugly truth about ourselves and our robot creations: the problem of bias and social inequity. Science and engineering ethics 24, 5 (2018), 1521–1536.

[59] Lingxiao Huang and Nisheeth Vishnoi. 2019. Stable and Fair Classification. In International Conference on Machine Learning. 2879–2890.

[60] Ben Hutchinson and Margaret Mitchell. 2019. 50 Years of Test (Un) fairness: Lessons for Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 49–58.

[61] L. Introna and H. Nissenbaum. 2000. Defining the Web: the politics of search engines. Computer 33, 1 (Jan 2000), 54–62. https://doi.org/10.1109/2.816269

[62] Ayush Jaiswal, Yue Wu, Wael AbdAlmageed, and Premkumar Natarajan. 2018. Unsupervised Adversarial Invariance. (2018). arXiv: https://arxiv.org/abs/cs.LG/1809.10083

[63] Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (01 Oct 2012), 1–33. https://doi.org/10.1007/s10115-011-0463-8

[64] Faisal Kamiran and Indre ̇ Žliobaite ̇. 2013. Explainable and Non-explainable Discrimination in Classification. Springer Berlin Heidelberg, Berlin, Heidelberg, 155–170. https://doi.org/10.1007/978-3-642-30487-3_8

[65] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 35–50.

[66] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In International Conference on Machine Learning. 2569–2577.

[67] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2019. An empirical study of rich subgroup fairness for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 100–109.

[68] Rogier Kievit, Willem Eduard Frankenhuis, Lourens Waldorp, and Denny Borsboom. 2013. Simpson’s paradox in psychological science: a practical guide. Frontiers in psychology 4 (2013), 513.

[69] Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. 2017. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems. 656–666.

[70] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016).

[71] Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit, Vol. 5. 79–86.

[72] Emmanouil Krasanakis, Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2018. Adaptive Sensitive Reweighting to Mitigate Bias in Fairness-aware Classification. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 853–862. https://doi.org/10.1145/3178876.3186133

[73] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual Fairness. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4066–4076. http://papers.nips.cc/paper/6995-counterfactual-fairness.pdf

[74] Anja Lambrecht and Catherine E Tucker. 2018. Algorithmic bias? An empirical study into apparent gender-based discrimination in the display of STEM career ads. An Empirical Study into Apparent Gender-Based Discrimination in the Display of STEM Career Ads (March 9, 2018) (2018).

[75] J Larson, S Mattu, L Kirchner, and J Angwin. 2016. Compas analysis. GitHub, available at: https://github.com/propublica/compas-analysis[Google Scholar] (2016).

[76] Blake Lemoine, Brian Zhang, and M Mitchell. 2018. Mitigating Unwanted Biases with Adversarial Learning. (2018).

[77]KristinaLerman.2018.Computationalsocialscientistbeware:SimpsonâA ̆Z ́sparadoxinbehavioraldata.Journalof Computational Social Science 1, 1 (2018), 49–58.

[78] Kristina Lerman and Tad Hogg. 2014. Leveraging position bias to improve peer recommendation. PloS one 9, 6 (2014), e98914.

[79]ZacharyCLipton,AlexandraChouldechova,andJulianMcAuley.2017.DoesmitigatingMLâA ̆Z ́sdisparateimpact require disparate treatment? stat 1050 (2017), 19.

[80] Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. 2018. Delayed Impact of Fair Machine Learning. In Proceedings of the 35th International Conference on Machine Learning.

[81] Joshua R Loftus, Chris Russell, Matt J Kusner, and Ricardo Silva. 2018. Causal reasoning for algorithmic fairness. arXiv preprint arXiv:1805.05859 (2018).

[82] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. 2016. THE VARIATIONAL FAIR AUTOENCODER. stat 1050 (2016), 4.

[83] Arjun K. Manrai, Birgit H. Funke, Heidi L. Rehm, Morten S. Olesen, Bradley A. Maron, Peter Szolovits, David M. Margulies, Joseph Loscalzo, and Isaac S. Kohane. 2016. Genetic Misdiagnoses and the Potential for Health Dis- parities. New England Journal of Medicine 375, 7 (2016), 655–665. https://doi.org/10.1056/NEJMsa1507092 arXiv:https://doi.org/10.1056/NEJMsa1507092 PMID: 27532831.

[84] Chandler May, Alex Wang, Shikha Bordia, Samuel R Bowman, and Rachel Rudinger. 2019. On measuring social biases in sentence encoders. arXiv preprint arXiv: https://arxiv.org/abs/1903.10561 (2019).

[85] Ninareh Mehrabi, Fred Morstatter, Nanyun Peng, and Aram Galstyan. 2019. Debiasing Community Detection: The Importance of Lowly-Connected Nodes. arXiv preprint arXiv: https://arxiv.org/abs/1903.08136 (2019).

[86] Aditya Krishna Menon and Robert C Williamson. 2018. The cost of fairness in binary classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (Proceedings of Machine Learning Research), Sorelle A. Friedler and Christo Wilson (Eds.), Vol. 81. PMLR, New York, NY, USA, 107–118. http://proceedings.mlr. press/v81/menon18a.html

[87] Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. 2019. Diversity in Faces. arXiv preprint arXiv: https://arxiv.org/abs/1901.10436 (2019).

[88] Hannah Jean Miller, Jacob Thebault-Spieker, Shuo Chang, Isaac Johnson, Loren Terveen, and Brent Hecht. 2016. âA ̆IJBlissfullyHappyâA ̆ ̇IorâA ̆IJReadytoFightâA ̆ ̇I:VaryingInterpretationsofEmoji.InTenthInternationalAAAI Conference on Web and Social Media.

[89] I Minchev, G Matijevic, DW Hogg, G Guiglion, M Steinmetz, F Anders, C Chiappini, M Martig, A Queiroz, and C Scannapieco. 2019. Yule-Simpson’s paradox in Galactic Archaeology. arXiv preprint arXiv: https://arxiv.org/abs/1902.01421 (2019).

[90] Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19). ACM, New York, NY, USA, 220–229. https://doi.org/10.1145/3287560.3287596

[91] Fred Morstatter, Jürgen Pfeffer, Huan Liu, and Kathleen M Carley. 2013. Is the sample good enough? Comparing data from twitter’s streaming API with Twitter’s firehose. In 7th International AAAI Conference on Weblogs and Social Media, ICWSM 2013. AAAI press.

[92] Daniel Moyer, Shuyang Gao, Rob Brekelmans, Aram Galstyan, and Greg Ver Steeg. 2018. Invariant Representations without Adversarial Training. In Advances in Neural Information Processing Systems. 9084–9093.

[93] Amitabha Mukerjee, Rita Biswas, Kalyanmoy Deb, and Amrit P Mathur. 2002. Multi–objective evolutionary algorithms for the risk–return trade–off in bank loan management. International Transactions in operational research 9, 5 (2002), 583–597.

[94] Razieh Nabi, Daniel Malinsky, and Ilya Shpitser. 2018. Learning Optimal Fair Policies. arXiv preprint arXiv: https://arxiv.org/abs/1809.02244 (2018).

[95] Razieh Nabi and Ilya Shpitser. 2018. Fair inference on outcomes. In Thirty-Second AAAI Conference on Artificial Intelligence.

[96] Azadeh Nematzadeh, Giovanni Luca Ciampaglia, Filippo Menczer, and Alessandro Flammini. 2017. How algorithmic popularity bias hinders or promotes quality. arXiv preprint arXiv: https://arxiv.org/abs/1707.00574 (2017).

[97] Dong-Phuong Nguyen, Rilana Gravel, Rudolf Berend Trieschnigg, and Theo Meder. 2013. "How old do you think I am?": A study of language and age in Twitter. In Proceedings of the Seventh International AAAI Conference on Weblogs andSocialMedia,ICWSM2013.AAAIPress,439–448. eemcs-eprint-23604.

[98] Anne O’Keeffe and Michael McCarthy. 2010. The Routledge handbook of corpus linguistics. Routledge.

[99] Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kiciman. 2016. Social data: Biases, methodological pitfalls, and ethical boundaries. (2016).

[100] Cathy O’Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group, New York, NY, USA.

[101] Osonde A Osoba and William Welser IV. 2017. An intelligence in our image: The risks of bias and errors in artificia intelligence. Rand Corporation.

[102] Edmund S Phelps. 1972. The statistical theory of racism and sexism. The american economic review 62, 4 (1972), 659–661.

[103] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. 2017. On Fairness and Calibration. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5680–5689. http://papers.nips.cc/paper/7151-on-fairness-and-calibration.pdf

[104] Marcelo OR Prates, Pedro H Avelar, and Luís C Lamb. 2018. Assessing gender bias in machine translation: a case study with Google Translate. Neural Computing and Applications (2018), 1–19.

[105] Bilal Qureshi, Faisal Kamiran, Asim Karim, and Salvatore Ruggieri. 2016. Causal discrimination discovery through propensity score analysis. arXiv preprint arXiv: https://arxiv.org/abs/1608.03735 (2016).

[106] Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products.

[107] M Redmond. 2011. Communities and crime unnormalized data set. UCI Machine Learning Repository. In website: http://www.ics.uci.edu/mlearn/MLRepository. html (2011).

[108] Lauren A Rivera. 2012. Hiring as cultural matching: The case of elite professional service firms. American sociological review 77, 6 (2012), 999–1022.

[109] Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. 2018. Gender Bias in Coreference Resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 8–14. https://doi.org/10.18653/v1/N18-2002

[110] Pedro Saleiro, Benedict Kuester, Abby Stevens, Ari Anisfeld, Loren Hinkson, Jesse London, and Rayid Ghani. 2018. Aequitas: A Bias and Fairness Audit Toolkit. arXiv preprint arXiv: https://arxiv.org/abs/1811.05577 (2018).

[111] Samira Samadi, Uthaipon Tantipongpipat, Jamie Morgenstern, Mohit Singh, and Santosh Vempala. 2018. The Price of Fair PCA: One Extra Dimension. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., USA, 10999–11010. http://dl.acm.org/citation.cfm?id=3327546.3327755

[112] Nripsuta Ani Saxena. 2019. Perceptions of Fairness. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19). ACM, New York, NY, USA, 537–538. https://doi.org/10.1145/3306618.3314314 Mehrabi et al.

[113] Nripsuta Ani Saxena, Karen Huang, Evan DeFilippis, Goran Radanovic, David C Parkes, and Yang Liu. 2019. How Do Fairness Definitions Fare?: Examining Public Attitudes Towards Algorithmic Definitions of Fairness. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 99–106.

[114] Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recom- mendations as Treatments: Debiasing Learning and Evaluation. In International Conference on Machine Learning. 1670–1679.

[115] Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 59–68.

[116] Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. 2017. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World. stat 1050 (2017), 22.

[117] Richard Shaw and Manuel Corpas. [n. d.]. Further bias in personal genomics? ([n. d.]).

[118] Harini Suresh and John V Guttag. 2019. A Framework for Understanding Unintended Consequences of Machine Learning. arXiv preprint arXiv: https://arxiv.org/abs/1901.10002 (2019).

[119] Songül Tolan, Marius Miron, Emilia Gómez, and Carlos Castillo. 2019. Why Machine Learning May Lead to Unfairness: Evidence from Risk Assessment for Juvenile Justice in Catalonia. (2019).

[120] Zeynep Tufekci. 2014. Big questions for social media big data: Representativeness, validity and other methodological pitfalls. In Eighth International AAAI Conference on Weblogs and Social Media.

[121] Berk Ustun, Yang Liu, and David Parkes. 2019. Fairness without Harm: Decoupled Classifiers with Preference Guarantees. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 6373–6382. http://proceedings.mlr.press/v97/ustun19a.html

[122] Eva Vanmassenhove, Christian Hardmeier, and Andy Way. 2018. Getting gender right in neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3003–3008.

[123] Sahil Verma and Julia Rubin. 2018. Fairness definitions explained. In 2018 IEEE/ACM International Workshop on Software Fairness (FairWare). IEEE, 1–7.

[124] Selwyn Vickers, Mona Fouad, and Moon S Chen Jr. 2014. Enhancing Minority Participation in Clinical Trials (EMPaCT): laying the groundwork for improving minority clinical trial accrual. Cancer 120 (2014), vi–vii.

[125] Ting Wang and Dashun Wang. 2014. Why Amazon’s ratings might mislead you: The story of herding effects. Big data 2, 4 (2014), 196–204.

[126] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna PN Puttaswamy, and Ben Y Zhao. 2009. User interactions in social networks and their implications. In Proceedings of the 4th ACM European conference on Computer systems. Acm, 205–218.

[127] Blake Woodworth, Suriya Gunasekar, Mesrob I Ohannessian, and Nathan Srebro. 2017. Learning non-discriminatory predictors. arXiv preprint arXiv: https://arxiv.org/abs/1702.06081 (2017).

[128] Yongkai Wu, Lu Zhang, and Xintao Wu. 2018. Fairness-aware Classification: Criterion, Convexity, and Bounds. (2018). arXiv: https://arxiv.org/abs/cs.LG/1809.04737

[129] Depeng Xu, Shuhan Yuan, Lu Zhang, and Xintao Wu. 2018. Fairgan: Fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 570–575.

[130] Irene Y Chen, Peter Szolovits, and Marzyeh Ghassemi. 2019. Can AI Help Reduce Disparities in General Medical and Mental Health Care? AMA journal of ethics 21 (02 2019), E167–179. https://doi.org/10.1001/amajethics.2019.167

[131] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2015. Fairness constraints: Mechanisms for fair classification. arXiv preprint arXiv: https://arxiv.org/abs/1507.05259 (2015).

[132] Lu Zhang and Xintao Wu. 2017. Anti-discrimination learning: a causal modeling-based framework. International Journal of Data Science and Analytics 4, 1 (01 Aug 2017), 1–16. https://doi.org/10.1007/s41060-017-0058-x

[133] Lu Zhang, Yongkai Wu, and Xintao Wu. 2016. On Discrimination Discovery Using Causal Networks. In Social, Cultural, and Behavioral Modeling, Kevin S. Xu, David Reitter, Dongwon Lee, and Nathaniel Osgood (Eds.). Springer International Publishing, Cham, 83–93.

[134] Lu Zhang, Yongkai Wu, and Xintao Wu. 2016. Situation Testing-based Discrimination Discovery: A Causal Inference Approach. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 2718–2724. http://dl.acm.org/citation.cfm?id=3060832.3061001

[135] Lu Zhang, Yongkai Wu, and Xintao Wu. 2017. Achieving non-discrimination in data release. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1335–1344.

[136] Lu Zhang, Yongkai Wu, and Xintao Wu. 2017. A Causal Framework for Discovering and Removing Direct and Indirect Discrimination. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 3929–3935. https://doi.org/10.24963/ijcai.2017/549

[137] L. Zhang, Y. Wu, and X. Wu. 2018. Causal Modeling-Based Discrimination Discovery and Removal: Criteria, Bounds, and Algorithms. IEEE Transactions on Knowledge and Data Engineering (2018), 1–1. https://doi.org/10.1109/TKDE. 2018.2872988

[138] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. 2019. Gender Bias in Contextualized Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 629–634.

[139] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.

[140] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. (2018). arXiv: https://arxiv.org/abs/cs.CL/1804.06876

[141] Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang. 2018. Learning Gender-Neutral Word Embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4847–4853.

[142] James Zou and Londa Schiebinger. 2018. AI can be sexist and racist it’s time to make it fair. (2018).

2 Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

GPU

Batch Normalization (BN) NVIDIA Collective Communication Library (NCCL)3 stochastic gradient descent(SGD)

References

[1] J. Bagga, H. Morsy, and Z. Yao. Opening designs for 6-pack and Wedge 100. https://code.facebook.com/posts/203733993317833/opening-designs-for-6-pack-and-wedge-100, 2016.

[2] M. Barnett, L. Shuler, R. van De Geijn, S. Gupta, D. G. Payne, and J. Watts. Interprocessor collective communication library (intercom). In Scalable High-Performance Computing Conference, 1994.

[3] L. Bottou. Curiously fast convergence of some stochastic gradient descent algorithms. Unpublished open problem offered to the attendance of the SLDS 2009 conference, 2009.

[4] L. Bottou, F. E. Curtis, and J. Nocedal. Opt. methods for large-scale machine learning. arXiv:1606.04838, 2016.

[5] J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting Distributed Synchronous SGD. arXiv:1604.00981, 2016.

[6] K. Chen and Q. Huo. Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering. In ICASSP, 2016.

[7] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. JMLR, 2011.

[8] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, 2014.

[9] R. Girshick. Fast R-CNN. In ICCV, 2015.

[10] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.

[11] W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, 1999.

[12] S. Gross and M. Wilber. Training and investigating Residual Nets. https://github.com/facebook/fb.resnet.torch, 2016.

[13] M. G¨urb¨uzbalaban, A. Ozdaglar, and P. Parrilo. Why random reshuffling beats stochastic gradient descent. arXiv: https://arxiv.org/abs/1510.08560, 2015.

[14] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick. Mask RCNN. arXiv: https://arxiv.org/abs/1703.06870, 2017.

[15] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015.

[16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.

[17] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012.

[18] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv: https://arxiv.org/abs/1510.08560, 2016.

[19] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

[20] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. On large-batch training for deep learning: Generalization gap and sharp minima. ICLR, 2017.

[21] A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997, 2014.

[22] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural nets. In NIPS, 2012.

[23] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989.

[24] K. Lee. Introducing Big Basin: Our next-generation AI hardware. https://code.facebook.com/posts/1835166200089399/introducing-big-basin, 2017.

[25] M. Li. Scaling Distributed Machine Learning with System and Algorithm Co-design. PhD thesis, Carnegie Mellon University, 2017.

[26] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In CVPR, 2017.

[27] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV. 2014.

[28] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.

[29] Y. Nesterov. Introductory lectures on convex optimization: A basic course. Springer, 2004.

[30] R. Rabenseifner. Optimization of collective reduction operations. In ICCS. Springer, 2004.

[31] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.

[32] H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, 1951.

[33] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.

[34] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014.

[35] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.

[36] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.

[37] R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective comm. operations in MPICH. IJHPCA, 2005.

[38] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv: https://arxiv.org/abs/1609.08144, 2016.

[39] S. Xie, R. Girshick, P. Doll´ar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In CVPR, 2017.

[40] W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig. The Microsoft 2016 Conversational Speech Recognition System. arXiv: https://arxiv.org/abs/1609.03528, 2016.

[41] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. In ECCV, 2014.

3 Bayesian Uncertainty Estimation for Batch Normalized Deep Networks

Constant Uncertainty Dropout (CUDO). Continuous Ranked Probability Score (CRPS) cross-validation (CV) Monte Carlo Dropout (MCDO) Monte Carlo Batch Normalization (MCBN) Multiplicative Normalizing Flows for variational Bayesian networks (MNF) Log likelihood (PLL) variational inference (VI) Probabilistic backpropagation (PBP) Kullback-Leibler (KL)

References

Bui, T. D., Hern´andez-Lobato, D., Li, Y., Hern´andez- Lobato, J. M., and Turner, R. E. Deep Gaussian Processes for Regression using Approximate Expectation Propagation. In ICML, 2016.

Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and Urtasun, R. Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2156, 2016.

Djuric, U., Zadeh, G., Aldape, K., and Diamandis, P. Precision histology: how deep learning is poised to revitalize histomorphology for personalized cancer care. npj Precision Oncology, 1(1):22, 2017.

Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., and Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature, Feb 2017.

Gal, Y. Uncertainty in Deep Learning. PhD thesis, University of Cambridge, 2016.

Gal, Y. and Ghahramani, Z. Dropout as a Bayesian Approximation : Representing Model Uncertainty in Deep Learning. ICML, 48:1–10, 2015.

Ghahramani, Z. Delve Datasets. University of Toronto, 1996. URL http://www.cs.toronto.edu/ {˜}delve/data/kin/desc.html.

Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature, 521(7553):452–459, May 2015.

Gneiting, T. and Raftery, A. E. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477):359–378, 2007.

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. arXiv preprint arXiv: https://arxiv.org/abs/1412.6572, 2014.

Graves, A. Practical Variational Inference for Neural Networks. NIPS, 2011.

Hern´andez-Lobato, J. M. and Adams, R. Probabilistic backpropagation for scalable learning of bayesian neural networks. In International Conference on Machine Learning, pp. 1861–1869, 2015.

Hinton, G. E. and Van Camp, D. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory, pp. 5–13. ACM, 1993.

Ioffe, S. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. CoRR, abs/1702.03275, 2017. URL http://arxiv.org/abs/1702.03275.

Ioffe, S. and Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Arxiv, 2015. URL http://arxiv.org/abs/1502.03167.

Karpathy, A. Convnetjs demo: toy 1d regression, 2015. URL http://cs.stanford.edu/people/karpathy/convnetjs/demo/regression.html.

Kendall, A., Badrinarayanan, V., and Cipolla, R. Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding. CoRR, abs/1511.0, 2015. URL http://arxiv.org/abs/1511.02680.

Kingma, D. P. and Welling, M. Auto-Encoding Variational Bayes. In ICLR, 2014.

Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. 2009.

Krueger, D., Huang, C.-W., Islam, R., Turner, R., Lacoste, A., and Courville, A. Bayesian hypernetworks. arXiv preprint arXiv: https://arxiv.org/abs/1710.04759, 2017.

Lehmann, E. L. Elements of Large-Sample Theory. Springer Verlag, New York, 1999. ISBN 0387985956.

Li, Y. and Gal, Y. Dropout Inference in Bayesian Neural Networks with Alpha-divergences. arXiv, https://arxiv.org/abs/1703.02914 2017.

Louizos, C. and Welling, M. Multiplicative normalizing flows for variational Bayesian neural networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 2218–2227, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/louizos17a.html.

MacKay, D. J. A practical bayesian framework for backpropagation networks. Neural computation, 4(3):448–472, 1992.

Neal, R. M. Bayesian Learning for Neural Networks. PhD thesis, University of Toronto, 1995.

Neal, R. M. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.

Selten, R. Axiomatic characterization of the quadratic scoring rule. Experimental Economics, 1(1):43–62, 1998. Bayesian Uncertainty Estimation for Batch Normalized Deep Networks

Shen, L. End-to-end training for whole image breast cancer diagnosis using an all convolutional design. arXiv preprint arXiv: https://arxiv.org/abs/1708.09427, 2017. University of California, I. UC Irvine Machine Learning Repository, 2017. URL https://archive.ics.uci.edu/ml/index.html.

Wang, S. I. and Manning, C. D. Fast dropout training. Proceedings of the 30th International Conference on Machine Learning, 28:118–126, 2013. URL http://machinelearning.wustl.edu/mlpapers/papers/wang13a.

4 Certifying and removing disparate impact

NGO BER

Disparate Impact(DI)

US Equal Employment Opportunity Commission (EEOC)

Logistic Regression(LR)

Regularized Logistic Regression(RLR)

References [1] S. Barocas and A. D. Selbst. Big data’s disparate impact. Technical report, available at SSRN: http://ssrn.com/abstract=2477899, 2014.

[2] T. Calders, F. Kamiran, and M. Pechenizkiy. Building classifiers with independency constraints. In ICDM Workshop Domain Driven Data Mining, pages 13–18, 2009.

[3] T. Calders and S. Verwer. Three naive bayes approaches for discrimination-free classification. Data Mining journal; special issue with selected papers from ECML/PKDD, 2010.

[4] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Proc. of Innovations in Theoretical Computer Science, 2012.

[5] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. J. of Machine Learning Research, 9:1871–1874, 2008.

[6] H. Hodson. No one in control: The algorithms that run our lives. New Scientist, Feb. 04, 2015.

[7] T. Joachims. A support vector method for multivariate performance measures. In Proc. of Intl. Conf. on Machine Learning, pages 377–384. ACM, 2005. 8 Acknowledgments 25

[8] F. Kamiran and T. Calders. Classifying without discriminating. In Proc. of the IEEE International Conference on Computer, Control and Communication, 2009.

[9] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. Fairness-aware classifier with prejudice remover regularizer. Machine Learning and Knowledge Discovery in Databases, pages 35–50, 2012.

[10] T. Kamishima, S. Akaho, and J. Sakuma. Fairness aware learning through regularization approach. In Proc of. Intl. Conf. on Data Mining, pages 643–650, 2011.

[11] B. T. Luong, S. Ruggieri, and F. Turini. k-nn as an implementation of situation testing for discrimination discovery and prevention. In Proc. of Intl. Conf. on Knowledge Discovery and Data Mining, KDD ’11, pages 502–510, 2011.

[12] A. Menon, H. Narasimhan, S. Agarwal, and S. Chawla. On the statistical consistency of algorithms for binary classification under class imbalance. In Proc. 30th. ICM, pages 603–611, 2013.

[13] W. Miao. Did the results of promotion exams have a disparate impact on minorities? Using statistical evidence in Ricci v. DeStefano. J. of Stat. Ed., 19(1), 2011.

[14] J. Pearl. Understanding simpson’s paradox. The American Statistician, 2014.

[15] D. Pedreschi, S. Ruggieri, and F. Turini. Integrating induction and deduction for finding evidence of discrimination. In Proc. of Intl. Conf. on Artificial Intelligence and Law, ICAIL ’09, pages 157–166, 2009.

[16] D. Pedreschi, S. Ruggieri, and F. Turini. A study of top-k measures for discrimination discovery. In Proc. of Symposium on Applied Computing, SAC ’12, pages 126–131, 2012.

[17] J. L. Peresie. Toward a coherent test for disparate impact discrimination. Indiana Law Journal, 84(3):Article 1, 2009.

[18] J. Podesta, P. Pritzker, E. J. Moniz, J. Holdren, and J. Zients. Big data: seizing opportunities, preserving values. Executive Office of the President, May 2014.

[19] A. Romei and S. Ruggieri. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, pages 1–57, April 3 2013.

[20] Supreme Court of the United States. Griggs v. Duke Power Co. 401 U.S. 424, March 8, 1971.

[21] Supreme Court of the United States. Watson v. Fort Worth Bank & Trust. 487 U.S. 977, 995, 1988.

[22] Supreme Court of the United States. Ricci v. DeStefano. 557 U.S. 557, 174, 2009. A A Survey of Discrimination Types 26

[23] Texas House of Representatives. House bill 588. 75th Legislature, 1997.

[24] The Leadership Conference. Civil rights principles for the era of big data. http://www.civilrights.org/press/2014/civil-rights-principles-big-data.html, Feb. 27, 2014.

[25] The U.S. EEOC. Uniform guidelines on employee selection procedures, March 2, 1979.

[26] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In Proc. of Intl. Conf. on Machine Learning, pages 325–333, 2013.

[27] M.-J. Zhao, N. Edakunni, A. Pocock, and G. Brown. Beyond Fano’s inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications. J. of Machine Learning Research, 14(1):1033–1090, 2013.

5 Class-Balanced Loss Based on Effective Number of Samles

ResNets

Class-Balanced(CB) Loss Convolutional Neural Networks(CNNs) Softmax (SM) Sigmoid(SGM)

References

[1] The iNaturalist 2018 Competition Dataset. https://github.com/visipedia/inat_comp. 5, 6

[2] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning. In OSDI, 2016. 6

[3] S. Bengio. Sharing representations for long tail computer vision problems. In ICMI, 2015. 1, 2

[4] M. Buda, A. Maki, and M. A. Mazurowski. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 2018. 1, 2

[5] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. JAIR, 2002. 2

[6] Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie. Large scale fine-grained categorization and domain-specific transfer learning. In CVPR, 2018. 2

[7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 1

[8] Q. Dong, S. Gong, and X. Zhu. Class rectification hard mining for imbalanced deep learning. In ICCV, 2017. 2

[9] C. Drummond, R. C. Holte, et al. C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In ICML Workshop, 2003. 2

[10] C. Elkan. The foundations of cost-sensitive learning. In IJCAI, 2001. 2

[11] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 1997. 2

[12] Y. Geifman and R. El-Yaniv. Deep active learning over the long tail. arXiv preprint arXiv: https://arxiv.org/abs/1711.00941, 2017. 1, 2

[13] P. Goyal, P. Doll´ar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv: https://arxiv.org/abs/1706.02677, 2017. 6

[14] H. He, Y. Bai, E. A. Garcia, and S. Li. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Neural Networks, 2008. 2

[15] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, 2008. 1

[16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 5, 9

[17] C. Huang, Y. Li, C. Change Loy, and X. Tang. Learning deep representation for imbalanced classification. In CVPR, 2016. 1, 2

[18] S. Janson. Random coverings in several dimensions. Acta Mathematica, 1986. 2, 3

[19] N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 2002. 1

[20] H. Kahn and A. W. Marshall. Methods of reducing sample size in monte carlo computations. Journal of the Operations Research Society of America, 1953. 2

[21] M. G. Kendall et al. The advanced theory of statistics. The advanced theory of statistics., (2nd Ed), 1946. 1

[22] S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE transactions on neural networks and learning systems, 2018. 2

[23] P.W. Koh and P. Liang. Understanding black-box predictions via influence functions. In ICML, 2017. 2

[24] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009. 1, 5, 9

[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems, 2012. 1

[26] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar. Focal loss for dense object detection. PAMI, 2018. 2, 5, 6

[27] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 1

[28] D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. van der Maaten. Exploring the limits of weakly supervised pretraining. In ECCV, 2018. 2

[29] T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of exemplar-svms for object detection and beyond. In ICCV, 2011. 2

[30] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Neural Information Processing Systems, 2013. 2

[31] W. Ouyang, X. Wang, C. Zhang, and X. Yang. Factors in finetuning deep model for object detection with long-tail distribution. In CVPR, 2016. 1, 2

[32] M. Ren, W. Zeng, B. Yang, and R. Urtasun. Learning to reweight examples for robust deep learning. In ICML, 2018. 2

[33] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015. 1, 5, 6

[34] N. Sarafianos, X. Xu, and I. A. Kakadiaris. Deep imbalanced attribute classification using visual attention aggregation. In ECCV, 2018. 2

[35] L. Shen, Z. Lin, and Q. Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In ECCV, 2016. 2

[36] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: https://arxiv.org/abs/1409.1556, 2014. 1

[37] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015. 1

[38] K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In ICML, 2000. 2

[39] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. PAMI, 2008. 1

[40] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie. The inaturalist species classification and detection dataset. In CVPR, 2018. 1, 5, 6

[41] G. Van Horn and P. Perona. The devil is in the tails: Fine-grained classification in the wild. arXiv preprint arXiv: https://arxiv.org/abs/1709.01450, 2017. 1

[42] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, 2011. 1

[43] Y.-X.Wang, D. Ramanan, and M. Hebert. Learning to model the tail. In Neural Information Processing Systems, 2017. 1, 2

[44] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker. Feature transfer learning for deep face recognition with long-tail data. arXiv preprint arXiv: https://arxiv.org/abs/1803.09014, 2018. 1, 2

[45] C. You, C. Li, D. P. Robinson, and R. Vidal. A scalable exemplar-based subspace clustering algorithm for classimbalanced data. In European Conference on Computer Vision, 2018. 2

[46] S. Zagoruyko and N. Komodakis. Wide residual networks. In BMVC, 2016. 6

[47] X. Zhang, Z. Fang, Y. Wen, Z. Li, and Y. Qiao. Range loss for deep face recognition with long-tailed training data. In CVPR, 2017. 1, 2

[48] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. PAMI, 2017. 1

[49] Z.-H. Zhou and X.-Y. Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 2006. 2

[50] Y. Zou, Z. Yu, B. V. Kumar, and J. Wang. Unsupervised domain adaptation for semantic segmentation via classbalanced self-training. In ECCV, 2018. 2

6 Classification with Fairness Constraints: A Meta-Algorithm with Probable Guarantees

US NYPD optimization(OPT)

References [1] ACM. Statement on algorithmic transparency and accountability. https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf, 2017.

[2] An Act. Civil rights act of 1964. Title VII, Equal Employment Opportunities, 1964.

[3] Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna M. Wallach. A reductions approach to fair classification. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 60–69, 2018.

[4] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. https://github.com/propublica/compas-analysis, 2016.

[5] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. ProPublica, May, 2016.

[6] Solon Barocas and Andrew D Selbst. Big data’s disparate impact. California Law Review, 2016.

[7] Richard Berk. The role of race in forecasts of violent crime. Race and social problems, 2009.

[8] Stephen Boyd and Almir Mutapcic. Stochastic subgradient methods. Lecture Notes for EE364b, Stanford University, 2008.

[9] Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classification. Data Min. Knowl. Discov., 21(2):277–292, 2010.

[10] L. Elisa Celis, Amit Deshpande, Tarun Kathuria, Damian Straszak, and Nisheeth K. Vishnoi. On the complexity of constrained determinantal point processes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2017, August 16-18, 2017, Berkeley, CA, USA, pages 36:1–36:22, 2017.

[11] L. Elisa Celis, Amit Deshpande, Tarun Kathuria, and Nisheeth K Vishnoi. How to be fair and diverse? In Fairness, Accountability, and Transparency in Machine Learning, 2016.

[12] L Elisa Celis, Lingxiao Huang, and Nisheeth K Vishnoi. Multiwinner voting with fairness constraints. In Proceedings of the Twenty-seventh International Joint Conference on Artificial Intelligence and the Twenty-third European Conference on Artificial Intelligence, IJCAI-ECAI, 2018.

[13] L. Elisa Celis, Vijay Keswani, Amit Deshpande, Tarun Kathuria, Damian Straszak, and Nisheeth K. Vishnoi. Fair and diverse DPP-based data summarization. In ICML, 2018.

[14] L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. Ranking with fairness constraints. In Proceedings of the fourty-fifth International Colloquium on Automata, Languages, and Programming ICALP, 2018.

[15] L. Elisa Celis and Nisheeth K Vishnoi. Fair personalization. In Fairness, Accountability, and Transparency in Machine Learning, 2017.

[16] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. CoRR, abs/1703.00056, 2017.

[17] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 797–806, 2017.

[18] Amit Datta, Michael Carl Tschantz, and Anupam Datta. Automated experiments on ad privacy settings. Proceedings on Privacy Enhancing Technologies, 2015.

[19] Bill Dedman et al. The color of money. Atlanta Journal-Constitution, 1988.

[20] Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2017.

[21] William Dieterich, Christina Mendoza, and Tim Brennan. Compas risk scales: Demonstrating accuracy equity and predictive parity. Northpoint Inc, 2016.

[22] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, January 8-10, 2012, pages 214–226. ACM, 2012.

[23] Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Mark D. M. Leiserson. Decoupled classifiers for group-fair and efficient machine learning. In Fairness, Accountability, and Transparency in Machine Learning, pages 119–133, 2018.

[24] ENTHOUGHT. SciPy. https://www.scipy.org/, 2018.

[25] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015, pages 259–268. ACM, 2015.

[26] Benjamin Fish, Jeremy Kun, and Ádám D Lelkes. A confidence-based approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, Florida, USA, May 5-7, 2016, pages 144–152. SIAM, 2016.

[27] Anthony W Flores, Kristin Bechtel, and Christopher T Lowenkamp. False positives, false negatives, and false analyses: A rejoinder to machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. Fed. Probation, 80:38, 2016.

[28] Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. On the (im) possibility of fairness. arXiv preprint arXiv: https://arxiv.org/abs/1609.07236, 2016.

[29] Naman Goel, Mohammad Yaghini, and Boi Faltings. Non-discriminatory machine learning through convex fairness criteria. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018, 2018.

[30] Sharad Goel, Justin M Rao, Ravi Shroff, et al. Precinct or prejudice? understanding racial disparities in new york city’s stop-and-frisk policy. The Annals of Applied Statistics, 10(1):365–394, 2016.

[31] Gabriel Goh, Andrew Cotter, Maya R. Gupta, and Michael P. Friedlander. Satisfying real-world goals with dataset constraints. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 2415–2423, 2016.

[32] Nina Grgic-Hlaca, Elissa M Redmiles, Krishna P Gummadi, and Adrian Weller. Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, pages 903–912, 2018.

[33] Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018, 2018.

[34] Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 3315–3323, 2016.

[35] Mara Hvistendahl. Can “predictive policing” prevent crime before it happens. Science AAAS, 2016.

[36] Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, pages 325–333, 2016.

[37] Faisal Kamiran and Toon Calders. Classifying without discriminating. In Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on, pages 1–6. IEEE, 2009.

[38] Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012.

[39] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II, pages 35–50, 2012.

[40] Michael Kearns, Aaron Roth, and Zhiwei Steven Wu. Meritocratic fairness for crosspopulation selection. In International Conference on Machine Learning, pages 1828–1836, 2017.

[41] Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, pages 43:1–43:23, 2017.

[42] Emmanouil Krasanakis, Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, and Yiannis Kompatsiaris. Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018. International World Wide Web Conferences Steering Committee, 2018.

[43] Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm. ProPublica (5 2016), 9, 2016.

[44] Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. k-nn as an implementation of situation testing for discrimination discovery and prevention. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, pages 502–510. ACM, 2011.

[45] Susan Magarey. The sex discrimination act 1984. Australian Feminist Law Journal, 2004.

[46] Michael W. Mahoney, Lorenzo Orecchia, and Nisheeth K. Vishnoi. A spectral algorithm for improving graph partitions. Journal of Machine Learning Research, 13:2339–2365, 2012.

[47] Subhransu Maji, Nisheeth K. Vishnoi, and Jitendra Malik. Biased normalized cuts. In The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011, pages 2057–2064, 2011.

[48] Aditya Krishna Menon and Robert C. Williamson. The cost of fairness in binary classification. In Conference on Fairness, Accountability and Transparency, FAT 2018, 23-24 February 2018, New York, NY, USA, pages 107–118, 2018.

[49] Claire Cain Miller. Can an algorithm hire better than a human. The New York Times, 25, 2015.

[50] Harikrishna Narasimhan, Rohit Vaish, and Shivani Agarwal. On the statistical consistency of plug-in classifiers for non-decomposable performance measures. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 1493–1501, 2014.

[51] Arvind Narayanan. Tutorial: 21 fairness definitions and their politics. https://www. youtube.com/watch?v=jIXIuYdnyyk, 2018.

[52] Northpointe. Compas risk and need assessment systems. http://www.northpointeinc. com/files/downloads/FAQ_Document.pdf, 2012.

[53] United States. Executive Office of the President and John Podesta. Big data: Seizing opportunities, preserving values. White House, Executive Office of the President, 2014.

[54] Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008, pages 560–568. ACM, 2008.

[55] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon M. Kleinberg, and Kilian Q. Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 5684–5693, 2017.

[56] Novi Quadrianto and Viktoriia Sharmanska. Recycling privileged learning and distribution matching for fairness. In Advances in Neural Information Processing Systems, pages 677–688, 2017.

[57] WhiteHouse. Big data: A report on algorithmic systems, opportunity, and civil rights. Executive Office of the President, 2016.

[58] Blake E. Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, and Nathan Srebro. Learning non-discriminatory predictors. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017, pages 1920–1953, 2017.

[59] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P. Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017, pages 1171–1180, 2017.

[60] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P. Gummadi. Fairness constraints: Mechanisms for fair classification. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, pages 962–970, 2017.

[61] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, Krishna P. Gummadi, and Adrian Weller. From parity to preference-based notions of fairness in classification. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 228–238, 2017.

[62] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pages 325–333, 2013.

[63] Indre Zliobaite. Measuring discrimination in algorithmic decision making. Data Min. Knowl. Discov., 2017.

7 Cost-Sensitive Feature Selection by Optimizing F-measures

correlation-based feature selection (CFS) cost-sensitive feature selection(CSFS) Information-Theoretic Feature Ranking (ITFR) Minimum Redundancy Maximum Relevance (mRMR) Multi-Label ReliefF (MLReliefF) Non-Convex Feature Learning (NCFS) Robust Feature Selection (RFS) support vector machine recursive feature elimination (SVM-RFE)

REFERENCES

[1] H. Lang and H. Ling, “Covert photo classification by fusing image features and visual attributes,” IEEE Transactions on Image Processing, vol. 24, no. 10, pp. 2996–3008, 2015.

[2] S. Bahrampour, N. M. Nasrabadi, A. Ray, and W. K. Jenkins, “Multimodal task-driven dictionary learning for image classification,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 24–38, 2016.

[3] D.Wang, F. Nie, and H. Huang, “Feature selection via global redundancy minimization,” IEEE Transactions on Knowledge & Data Engineering, vol. PP, no. 99, pp. 2743–2755, 2015.

[4] F. Nie, S. Xiang, Y. Jia, C. Zhang, and S. Yan, “Trace ratio criterion for feature selection,” in National Conference on Artificial Intelligence, 2008, pp. 671–676.

[5] Y. Luo, T. Liu, D. Tao, and C. Xu, “Decomposition-based transfer distance metric learning for image classification,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 3789–3801, 2014. IEEE TIP-15403-2016 FINAL VERSION, VOL. XX, NO. X, DECEMBER 2017 12

[6] M. J. Saberian and N. Vasconcelos, “Boosting algorithms for simultaneous feature extraction and selection,” in CVPR, 2012, pp. 2448–2455.

[7] M. D. Gupta and J. Xiao, “Non-negative matrix factorization as a feature selection tool for maximum margin classifiers,” in CVPR, 2011, pp. 2841–2848.

[8] A. Wang, J. Lu, J. Cai, G. Wang, and T.-J. Cham, “Unsupervised joint feature learning and encoding for rgb-d scene labeling,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4459–4473, 2015.

[9] Y. Yang, H. T. Shen, Z. Ma, Z. Huang, and X. Zhou, “ 2;1-norm regularized discriminative feature selection for unsupervised learning,” in IJCAI, 2011, pp. 1589–1594.

[10] K. Shin and A. P. Angulo, “A geometric theory of feature selection and distance-based measures,” in IJCAI, 2015, pp. 3812–3819.

[11] S. M. Villela, S. de Castro Leite, and R. F. Neto, “Feature selection from microarray data via an ordered search with projected margin,” in IJCAI, 2015, pp. 3874–3881.

[12] W. Jiang, G. Er, Q. Dai, and J. Gu, “Similarity-based online feature selection in content-based image retrieval,” IEEE Transactions on Image Processing, vol. 15, no. 3, pp. 702–712, 2006.

[13] Y. Luo, Y. Wen, D. Tao, J. Gui, and C. Xu, “Large margin multi-modal multi-task feature extraction for image classification,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 414–427, 2016.

[14] H. Tao, C. Hou, F. Nie, Y. Jiao, and D. Yi, “Effective discriminative feature selection with nontrivial solution.” IEEE Transactions on Neural Networks & Learning Systems, vol. 27, no. 4, pp. 3013–3017, 2016.

[15] I. Kononenko, “Estimating attributes: analysis and extensions of RELIEF,” in ECML, 1994, pp. 171–182.

[16] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and minredundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005.

[17] H. Liu and H. Motoda, Feature selection for knowledge discovery and data mining. Springer Science & Business Media, 2012, vol. 454.

[18] L. E. Raileanu and K. Stoffel, “Theoretical comparison between the gini index and information gain criteria,” Annals of Mathematics and Artificial Intelligence, vol. 41, no. 1, pp. 77–93, 2004.

[19] M. A. Hall and L. A. Smith, “Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper.” in FLAIRS, 1999, pp. 235–239.

[20] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1-3, pp. 389–422, 2002.

[21] F. Nie, H. Huang, X. Cai, and C. H. Ding, “Efficient and robust feature selection via joint 2;1-norms minimization,” in NIPS, 2010, pp. 1813– 1821.

[22] D. Han and J. Kim, “Unsupervised simultaneous orthogonal basis clustering feature selection,” in CVPR, 2015, pp. 5016–5023.

[23] J. J. Hull, “A database for handwritten text recognition research,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 5, pp. 550–554, 1994.

[24] X. Chen and M. Wasikowski, “Fast: a roc-based feature selection metric for small samples and imbalanced data classification problems,” in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008, pp. 124–132.

[25] Q. Shi, B. Du, and L. Zhang, “Spatial coherence-based batch-mode active learning for remote sensing image classification,” IEEE Transactions on Image Processing, vol. 24, no. 7, pp. 2037–2050, 2015.

[26] J.-M. Guo and H. Prasetyo, “Content-based image retrieval using features extracted from halftoning-based block truncation coding,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 1010–1024, 2015.

[27] Y. Chen and C. Lin, “Combining svms with various feature selection strategies,” in Feature Extraction. Springer, 2006, pp. 315–324.

[28] M. Qian and C. Zhai, “Robust unsupervised feature selection,” in IJCAI, 2013, pp. 1621–1627.

[29] Q. Gu, Z. Li, and J. Han, “Joint feature selection and subspace learning,” in IJCAI, 2011, pp. 1294–1299.

[30] Y. Zhang and Z.-H. Zhou, “Cost-sensitive face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 10, pp. 1758–1769, 2010.

[31] S. P. Parambath, N. Usunier, and Y. Grandvalet, “Optimizing F-measures by cost-sensitive classification,” in NIPS, 2014, pp. 2123–2131.

[32] K. J. Dembczynski, W. Waegeman, W. Cheng, and E. H¨ullermeier, “An exact algorithm for F-measure maximization,” in NIPS, 2011, pp. 1404– 1412.

[33] I. Pillai, G. Fumera, and F. Roli, “F-measure optimisation in multi-label classifiers,” in ICPR, 2012, pp. 2424–2427.

[34] W. Cheng, K. Dembczy´nski, E. H¨ullermeier, A. Jaroszewicz, and W. Waegeman, “F-measure maximization in topical classification,” in Rough Sets and Current Trends in Computing, 2012, pp. 439–446.

[35] K. J. Dembczynski, A. Jachnik, W. Kotlowski, W. Waegeman, and E. H¨ullermeier, “Optimizing the F-measure in multi-label classification: Plug-in rule approach versus structured loss minimization,” in ICML, 2013, pp. 1130–1138.

[36] N. Ye, K. M. A. Chai, W. S. Lee, and H. L. Chieu, “Optimizing Fmeasures: a tale of two approaches,” in ICML, 2012.

[37] D. D. Lewis, “Evaluating and optimizing autonomous text classification systems,” in SIGIR, 1995, pp. 246–254.

[38] M. Jansche, “Maximum expected F-measure training of logistic regression models,” in Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 2005, pp. 692–699.

[39] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” JMLR, vol. 6, pp. 1453–1484, 2005.

[40] Y. Yang, “A study of thresholding strategies for text categorization,” in SIGIR, 2001, pp. 137–145.

[41] O. O. Koyejo, N. Natarajan, P. K. Ravikumar, and I. S. Dhillon, “Consistent binary classification with generalized performance metrics,” in NIPS, 2014, pp. 2744–2752.

[42] H. Narasimhan, R. Vaish, and S. Agarwal, “On the statistical consistency of plug-in classifiers for non-decomposable performance measures,” in NIPS, 2014, pp. 1493–1501.

[43] X. Zhu, H. I. Suk, and D. Shen, “Matrix-similarity based loss function and feature selection for Alzheimer’s disease diagnosis,” in CVPR, 2014, pp. 3089–3096.

[44] D. Kong and C. Ding, “Non-convex feature learning via p;1 operator,” in AAAI, 2014.

[45] M. Wasikowski and X. Chen, “Combating the small sample class imbalance problem using feature selection,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1388–1400, 2010.

[46] J. Kim, Y. Wang, and Y. Yasunori, “The genia event extraction shared task, 2013 edition-overview,” in Proceedings of the BioNLP Shared Task 2013 Workshop, 2013, pp. 8–15.

[47] S. Xiang, F. Nie, G. Meng, C. Pan, and C. Zhang, “Discriminative least squares regression for multiclass classification and feature selection,” IEEE Transactions on Neural Networks & Learning Systems, vol. 23, no. 11, pp. 1738–1754, 2012.

[48] R. He, T. Tan, L. Wang, and W.-S. Zheng, “2;1 regularized correntropy for robust feature selection,” in CVPR, 2012, pp. 2504–2511.

[49] X. Cai, F. Nie, and H. Huang, “Exact top-k feature selection via l2, 0-norm constraint.” in IJCAI, 2013, pp. 1240–1246.

[50] D. Kong, C. Ding, H. Huang, and H. Zhao, “Multi-label ReliefF and F-statistic feature selections for image annotation,” in CVPR, 2012, pp. 2352–2359.

[51] J. Lee and D. Kim, “Fast multi-label feature selection based on information-theoretic feature ranking,” Pattern Recognition, vol. 48, no. 9, pp. 2761–2771, 2015.

[52] C. Elkan, “The foundations of cost-sensitive learning,” in International joint conference on artificial intelligence, vol. 17, no. 1, 2001, pp. 973– 978.

8 Data Decision and Theoretical Implications when Adversarially Learning Fair Representations

ReLU UCI

deep neural network (DNN)

REFERENCES

[1] Alex Beutel, Ed H Chi, Zhiyuan Cheng, Hubert Pham, and John Anderson. 2017. Beyond Globally Optimal: Focused Learning for Improved Recommendations. In Proceedings of the 26th International Conference onWorld WideWeb. International World Wide Web Conferences Steering Commi�ee, 203–212.

[2] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems. 4349–4357.

[3] Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. 2016. Domain separation networks. In Advances in Neural Information Processing Systems. 343–351.

[4] John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159.

[5] Harrison Edwards and Amos Storkey. 2015. Censoring representations with an adversary. arXiv preprint arXiv: https://arxiv.org/abs/1511.05897 (2015).

[6] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Franc¸ois Laviole�e, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal of Machine Learning Research 17, 59 (2016), 1–35.

[7] Moritz Hardt, Eric Price, Nati Srebro, et al. 2016. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems. 3315– 3323.

[8] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent trade-o�s in the fair determination of risk scores. arXiv preprint arXiv: https://arxiv.org/abs/1609.05807 (2016).

[9] M. Lichman. 2013. UCI Machine Learning Repository. (2013). h�p://archive.ics. uci.edu/ml

[10] Christos Louizos, Kevin Swersky, Yujia Li, MaxWelling, and Richard Zemel. 2015. �e variational fair autoencoder. arXiv preprint arXiv: https://arxiv.org/abs/1511.00830 (2015).

[11] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13). 325–333.

9 Decision Theory for Discrimination-aware Classification

Discrimination-Aware Ensemble (DAE) naive Bayes (NBS) Reject Option based Classification (ROC),

REFERENCES

[1] T. Calders and S. Verwer, “Three naive Bayes approaches for discrimination-free classification,” DMKD, vol. 21, no. 2, pp. 277–292, 2010.

[2] F. Kamiran, T. Calders, and M. Pechenizkiy, “Discrimination aware decision tree learning,” in ICDM, 2010, pp. 869–874.

[3] D. Pedreschi, S. Ruggieri, and F. Turini, “Discriminationaware data mining,” in KDD, 2008.

[4] B. Luong, S. Ruggieri, and F. Turini, “k-nn as an implementation of situation testing for discrimination discovery and prevention,” in KDD, 2011, pp. 502–510.

[5] S. Hajian and J. Domingo-Ferrer, “A methodology for direct and indirect discrimination prevention in data mining,” TKDE, vol. accepted, 2012.

[6] F. Kamiran and T. Calders, “Data preprocessing techniques for classification without discrimination,” KAIS, pp. 1–33, 2012.

[7] I. Zliobaite, F. Kamiran, and T. Calders, “Handling conditional discrimination,” in ICDM), 2011, pp. 992–1001.

[8] T. Kamishima, S. Akaho, and J. Sakuma, “Fairness-aware learning through regularization approach,” in ICDMW), 2011.

[9] L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,” Machine Learning, vol. 51, pp. 181–207, 2003.

[10] A. Asuncion and D. Newman, “UCI machine learning repository,” Online http://archive.ics.uci.edu/ml/, 2007.

10 Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Army Research Laboratory(ARL) Deep Learning (DL)

REFERENCES

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

[2] T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for lvcsr,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 8614–8618.

[3] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Le- Cun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” in International Conference on Learning Representations (ICLR 2014). arXiv preprint arXiv: https://arxiv.org/abs/1312.6229, 2014.

[4] G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422–3426.

[5] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, “Droid-sec: deep learning in android malware detection,” in Proceedings of the 2014 ACM conference on SIGCOMM. ACM, 2014, pp. 371–372.

[6] E. Knorr, “How paypal beats the bad guys with machine learning,” 2015. [Online]. Available: http://www.infoworld.com/article/2907877/machine-learning/how-paypal-reduces-fraud-with-machine-learning.html

[7] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in Proceedings of the 1st IEEE European Symposium on Security and Privacy. IEEE, 2016.

[8] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proceedings of the 2014 International Conference on Learning Representations. Computational and Biological Learning Society, 2014.

[9] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proceedings of the 2015 International Conference on Learning Representations. Computational and Biological Learning Society, 2015.

[10] NVIDIA, “Nvidia tegra drive px: Self-driving car computer,” 2015. [Online]. Available: http://www.nvidia.com/object/drive-px.html

[11] D. Cires¸an, U. Meier, J. Masci et al., “Multi-column deep neural network for traffic sign classification.”

[12] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. Tygar, “Adversarial machine learning,” in Proceedings of the 4th ACM workshop on Security and artificial intelligence. ACM, 2011, pp. 43–58.

[13] B. Biggio, G. Fumera et al., “Pattern recognition systems under attack: Design issues and research challenges,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 28, no. 07, p. 1460002, 2014.

[14] B. Biggio, I. Corona, D. Maiorca, B. Nelson et al., “Evasion attacks against machine learning at test time,” in Machine Learning and Knowledge Discovery in Databases. Springer, 2013, pp. 387–402.

[15] A. Anjos and S. Marcel, “Counter-measures to photo attacks in face recognition: a public database and a baseline,” in Proceedings of the 2011 International Joint Conference on Biometrics. IEEE, 2011.

[16] P. Fogla and W. Lee, “Evading network anomaly detection systems: formal reasoning and practical techniques,” in Proceedings of the 13th ACM conference on Computer and communications security. ACM, 2006, pp. 59–68.

[17] S. Gu and L. Rigazio, “Towards deep neural network architectures robust to adversarial examples,” in Proceedings of the 2015 International Conference on Learning Representations. Computational and Biological Learning Society, 2015.

[18] J. Ba and R. Caruana, “Do deep nets really need to be deep?” in Advances in Neural Information Processing Systems, 2014, pp. 2654– 2662.

[19] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in Deep Learning and Representation Learning Workshop at NIPS 2014. arXiv preprint arXiv: https://arxiv.org/abs/1503.02531, 2014.

[20] Y. LeCun and C. Cortes, “The mnist database of handwritten digits,” 1998.

[21] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” 2009.

[22] Y. Bengio, I. J. Goodfellow, and A. Courville, “Deep learning,” 2015, book in preparation for MIT Press. [Online]. Available: http://www.iro.umontreal.ca/�bengioy/dlbook

[23] G. E. Hinton, “Learning multiple layers of representation,” Trends in cognitive sciences, vol. 11, no. 10, pp. 428–434, 2007.

[24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Cognitive modeling, vol. 5, 1988.

[25] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 281–305, 2012.

[26] X. Glorot, A. Bordes, and Y. Bengio, “Domain adaptation for largescale sentiment classification: A deep learning approach,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 513–520.

[27] J. Masci, U. Meier, D. Cires¸an et al., “Stacked convolutional autoencoders for hierarchical feature extraction,” in Artificial Neural Networks and Machine Learning–ICANN 2011. Springer, 2011, pp. 52–59.

[28] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio, “Why does unsupervised pre-training help deep learning?” The Journal of Machine Learning Research, vol. 11, pp. 625–660, 2010.

[29] T. Miyato, S. Maeda, M. Koyama et al., “Distributional smoothing by virtual adversarial examples,” CoRR, vol. abs/1507.00677, 2015.

[30] A. Fawzi, O. Fawzi, and P. Frossard, “Analysis of classifiers’ robustness to adversarial perturbations,” in Deep Learning Workshop at ICML 2015. arXiv preprint arXiv: https://arxiv.org/abs/1502.02590, 2015.

[31] H. Drucker and Y. Le Cun, “Improving generalization performance using double backpropagation,” Neural Networks, IEEE Transactions on, vol. 3, no. 6, pp. 991–997, 1992.

[32] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in In Computer Vision and Pattern Recognition (CVPR 2015). IEEE, 2015.

[33] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals, and Systems , vol. 5, no. 4, p. 455, 1992.

[34] S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan, “Learnability, stability and uniform convergence,” The Journal of Machine Learning Research, vol. 11, pp. 2635–2670, 2010.

[35] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio, “Theano: a cpu and gpu math expression compiler,” in Proceedings of the Python for scientific computing conference (SciPy), vol. 4. Austin, TX, 2010, p. 3.

[36] E. Battenberg, S. Dieleman, D. Nouri, E. Olson, A. van den Oord, C. Raffel, J. Schlter, and S. Kaae Snderby, “Lasagne: Lightweight library to build and train neural networks in theano,” 2015. [Online]. Available: https://github.com/Lasagne/Lasagne

[37] B. Biggio, K. Rieck, D. Ariu, C. Wressnegger et al., “Poisoning behavioral malware clustering,” in Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop. ACM, 2014, pp. 27–36.

[38] D. Warde-Farley and I. Goodfellow, “Adversarial perturbations of deep neural networks,” in Advanced Structured Prediction, T. Hazan, G. Papandreou, and D. Tarlow, Eds., 2016.

[39] M. Barreno, B. Nelson, A. D. Joseph, and J. Tygar, “The security of machine learning,” Machine Learning, vol. 81, no. 2, pp. 121–148, 2010.

[40] W. Xu, Y. Qi et al., “Automatically evading classifiers,” in Proceedings of the 2016 Network and Distributed Systems Symposium, 2016.

[41] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar, “Can machine learning be secure?” in Proceedings of the 2006 ACM Symposium on Information, computer and communications security. ACM, 2006, pp. 16–25.

[42] B. Biggio, G. Fumera, and F. Roli, “Security evaluation of pattern classifiers under attack,” Knowledge and Data Engineering, IEEE Transactions on, vol. 26, no. 4, pp. 984–996, 2014.

[43] B. Biggio, B. Nelson, and L. Pavel, “Poisoning attacks against support vector machines,” in Proceedings of the 29th International Conference on Machine Learning, 2012.

[44] B. Biggio, B. Nelson, and P. Laskov, “Support vector machines under adversarial label noise.” in ACML, 2011, pp. 97–112.

11 Domain-Adversarial Training of Neural Networks

CMC PRID CUHK

Canada Foundation for Innovation (CFI)

domain adaptation (DA)

Deep Metric Learning (DM)

Fonds de recherche du Quebec Nature et technologies (FRQNT)

gradient reversal layer (GRL)

marginalized Stacked Autoencoders (mSDA)

National Science and Engineering Research Council (NSERC)

Proxy A-distances (PAD)

References

Hana Ajakan, Pascal Germain, Hugo Larochelle, Fran�cois Laviolette, and Mario Marchand. Domain-adversarial neural networks. NIPS 2014 Workshop on Transfer and Multi-task learning: Theory Meets Practice, 2014. URL http://arxiv.org/abs/1412.4446.

Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and hierarchical image segmentation. IEEE Transaction Pattern Analysis and Machine Intelligence, 33, 2011.

Artem Babenko, Anton Slesarev, Alexander Chigorin, and Victor S. Lempitsky. Neural codes for image retrieval. In ECCV, pages 584{599, 2014.

Mahsa Baktashmotlagh, Mehrtash Tafazzoli Harandi, Brian C. Lovell, and Mathieu Salzmann. Unsupervised domain adaptation by domain invariant projection. In ICCV, pages 769{776, 2013.

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In NIPS, pages 137{144, 2006.

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from di�erent domains. Machine Learning, 79(1-2):151{175, 2010.

John Blitzer, Ryan T. McDonald, and Fernando Pereira. Domain adaptation with structural correspondence learning. In Conference on Empirical Methods in Natural Language Processing, pages 120{128, 2006.

Ganin, Ustinova, Ajakan, Germain, Larochelle, Laviolette, Marchand and Lempitsky Karsten M. Borgwardt, Arthur Gretton, Malte J. Rasch, Hans-Peter Kriegel, Bernhard Sch�olkopf, and Alexander J. Smola. Integrating structured biological data by kernel maximum mean discrepancy. In ISMB, pages 49{57, 2006.

Lorenzo Bruzzone and Mattia Marconcini. Domain adaptation problems: A DASVM classi- cation technique and a circular validation strategy. IEEE Transaction Pattern Analysis and Machine Intelligence, 32(5):770{787, 2010.

Minmin Chen, Zhixiang Eddie Xu, Kilian Q. Weinberger, and Fei Sha. Marginalized denoising autoencoders for domain adaptation. In ICML, pages 767{774, 2012.

Qiang Chen, Junshi Huang, Rogerio Feris, Lisa M. Brown, Jian Dong, and Shuicheng Yan. Deep domain adaptation for describing people based on ne-grained clothing attributes. In CVPR, June 2015.

S. Chopra, S. Balakrishnan, and R. Gopalan. Dlid: Deep learning for domain adaptation by interpolating between domains. In ICML Workshop on Challenges in Representation Learning, 2013.

Dan Cire�san, Ueli Meier, Jonathan Masci, and J�urgen Schmidhuber. Multi-column deep neural network for tra�c sign classi cation. Neural Networks, 32:333{338, 2012.

Corinna Cortes and Mehryar Mohri. Domain adaptation and sample bias correction theory and algorithm for regression. Theor. Comput. Sci., 519:103{126, 2014.

Je� Donahue, Yangqing Jia, Oriol Vinyals, Judy Ho�man, Ning Zhang, Eric Tzeng, and Trevor Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, 2014.

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Technical report, EECS Department, University of California, Berkeley, Mar 2010.

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classi cation. Journal of Machine Learning Research, 9:1871{1874, 2008.

Basura Fernando, Amaury Habrard, Marc Sebban, and Tinne Tuytelaars. Unsupervised visual domain adaptation using subspace alignment. In ICCV, 2013.

Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. In ICML, pages 325{333, 2015. URL http://jmlr.org/proceedings/papers/ v37/ganin15.html.

Pascal Germain, Amaury Habrard, Fran�cois Laviolette, and Emilie Morvant. A PACBayesian approach for domain adaptation with specialization to linear classi ers. In ICML, pages 738{746, 2013.

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classi cation: A deep learning approach. In ICML, pages 513{520, 2011.

Domain-Adversarial Neural Networks Boqing Gong, Yuan Shi, Fei Sha, and Kristen Grauman. Geodesic ow kernel for unsupervised domain adaptation. In CVPR, pages 2066{2073, 2012.

Boqing Gong, Kristen Grauman, and Fei Sha. Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In ICML, pages 222{230, 2013.

Shaogang Gong, Marco Cristani, Shuicheng Yan, and Chen Change Loy. Person re- identi cation. Springer, 2014.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.

Raghuraman Gopalan, Ruonan Li, and Rama Chellappa. Domain adaptation for object recognition: An unsupervised approach. In ICCV, pages 999{1006, 2011.

Doug Gray, Shane Brennan, and Hai Tao. Evaluating appearance models for recognition, reacquisition, and tracking. In IEEE International Workshop on Performance Evaluation for Tracking and Surveillance, Rio de Janeiro, 2007.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.331.7285&rep=rep1&type=pdf

Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. Person re-identi cation by descriptive and discriminative classi cation. In SCIA, 2011.

Judy Ho�man, Eric Tzeng, Je� Donahue, Yangqing Jia, Kate Saenko, and Trevor Darrell. One-shot adaptation of supervised deep convolutional models. CoRR, abs/1312.6204, 2013. URL http://arxiv.org/abs/1312.6204.

Fei Huang and Alexander Yates. Biased representation learning for domain adaptation. In Joint Conference on Empirical Methods in Natural Language Processing and Computa- tional Natural Language Learning, pages 1313{1323, 2012.

Jiayuan Huang, Alexander J. Smola, Arthur Gretton, Karsten M. Borgwardt, and Bernhard Sch�olkopf. Correcting sample selection bias by unlabeled data. In NIPS, pages 601{608, 2006.

Yangqing Jia, Evan Shelhamer, Je� Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Ca�e: Convolutional architecture for fast feature embedding. CoRR, abs/1408.5093, 2014.

Daniel Kifer, Shai Ben-David, and Johannes Gehrke. Detecting change in data streams. In Very Large Data Bases, pages 180{191, 2004.

Alex Krizhevsky, Ilya Sutskever, and Geo�rey Hinton. Imagenet classi cation with deep convolutional neural networks. In NIPS, pages 1097{1105, 2012.

Alexandre Lacoste, Fran�cois Laviolette, and Mario Marchand. Bayesian comparison of machine learning algorithms on single and multiple datasets. In AISTATS, pages 665{ 675, 2012.

Y. LeCun, L. Bottou, Y. Bengio, and P. Ha�ner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278{2324, November 1998.

Ganin, Ustinova, Ajakan, Germain, Larochelle, Laviolette, Marchand and Lempitsky Wei Li and Xiaogang Wang. Locally aligned feature transforms across views. In CVPR, pages 3594{3601, 2013. https://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Li_Locally_Aligned_Feature_2013_CVPR_paper.pdf

Yujia Li, Kevin Swersky, and Richard Zemel. Unsupervised domain adaptation by domain invariant projection. In NIPS 2014 Workshop on Transfer and Multitask Learning, 2014.

Joerg Liebelt and Cordelia Schmid. Multi-view object class detection with a 3d geometric model. In CVPR, 2010.

Chunxiao Liu, Chen Change Loy, Shaogang Gong, and Guijin Wang. POP: person reidenti cation post-rank optimisation. In ICCV, pages 441{448, 2013.

Mingsheng Long and Jianmin Wang. Learning transferable features with deep adaptation networks. CoRR, abs/1502.02791, 2015.

Andy Jinhua Ma, Jiawei Li, Pong C. Yuen, and Ping Li. Cross-domain person reidenti - cation using domain adaptation ranking svms. IEEE Transactions on Image Processing, 24(5):1599{1613, 2015.

Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation: Learning bounds and algorithms. In COLT, 2009a.

Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Multiple source adaptation and the r�enyi divergence. In UAI, pages 367{374, 2009b.

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In CVPR, 2014.

Sakrapee Paisitkriangkrai, Chunhua Shen, and Anton van den Hengel. Learning to rank in person re-identi cation with metric ensembles. CoRR, abs/1503.01543, 2015. URL http://arxiv.org/abs/1503.01543.

Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok, and Qiang Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199{210, 2011.

Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. In ECCV, pages 213{226, 2010.

Nitish Srivastava, Geo�rey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from over tting. The Journal of Machine Learning Research, 15(1):1929{1958, 2014.

Michael Stark, Michael Goesele, and Bernt Schiele. Back to the future: Learning shape models from 3d CAD data. In BMVC, pages 1{11, 2010.

Domain-Adversarial Neural Networks Baochen Sun and Kate Saenko. From virtual to reality: Fast adaptation of virtual object detectors to real domains. In BMVC, 2014.

Eric Tzeng, Judy Ho�man, Ning Zhang, Kate Saenko, and Trevor Darrell. Deep domain confusion: Maximizing for domain invariance. CoRR, abs/1412.3474, 2014. URL http://arxiv.org/abs/1412.3474.

Laurens van der Maaten. Barnes-Hut-SNE. CoRR, abs/1301.3342, 2013. URL http://arxiv.org/abs/1301.3342.

David V�azquez, Antonio Manuel L�opez, Javier Mar��n, Daniel Ponsa, and David Ger�onimo Gomez. Virtual and real world adaptationfor pedestrian detection. IEEE Transaction Pattern Analysis and Machine Intelligence, 36(4):797{809, 2014.

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, pages 1096{1103, 2008.

Dong Yi, Zhen Lei, and Stan Z. Li. Deep metric learning for practical person reidenti cation. CoRR, abs/1407.4979, 2014. URL http://arxiv.org/abs/1407.4979.

Matthew D. Zeiler. ADADELTA: an adaptive learning rate method. CoRR, abs/1212.5701, 2012. URL http://arxiv.org/abs/1212.5701.

Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013. URL http://arxiv.org/abs/1311.2901.

Ziming Zhang and Venkatesh Saligrama. Person re-identi cation via structured prediction. CoRR, abs/1406.4444, 2014. URL http://arxiv.org/abs/1406.4444.

Rui Zhao, Wanli Ouyang, and XiaogangWang. Person re-identi cation by saliency learning. CoRR, abs/1412.1908, 2014. URL http://arxiv.org/abs/1412.1908.

Erheng Zhong, Wei Fan, Qiang Yang, Olivier Verscheure, and Jiangtao Ren. Cross validation framework to choose amongst models and datasets for transfer learning. In Machine Learning and Knowledge Discovery in Databases, pages 547{562. Springer, 2010.

12 Empirical Risk Minimization under Fairness Constraints

ERM

FERM

ACC

Equal Opportunity (EO)

Reproducing Kernel Hilbert Space (RKHS)

References

[1] C. Dwork, N. Immorlica, A. T. Kalai, and M. D. M. Leiserson. Decoupled classifiers for group-fair and efficient machine learning. In Conference on Fairness, Accountability and Transparency, 2018.

[2] M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. In Advances in neural information processing systems, 2016.

[3] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In International Conference on World Wide Web, 2017.

[4] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In International Conference on Machine Learning, 2013.

[5] N. Kilbertus, M. Rojas-Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, 2017.

[6] M. J. Kusner, J. Loftus, C. Russell, and R. Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, 2017.

[7] F. Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, and K. R. Varshney. Optimized preprocessing for discrimination prevention. In Advances in Neural Information Processing Systems, 2017.

[8] M. Joseph, M. Kearns, J. H. Morgenstern, and A. Roth. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, 2016.

[9] F. Chierichetti, R. Kumar, S. Lattanzi, and S. Vassilvitskii. Fair clustering through fairlets. In Advances in Neural Information Processing Systems, 2017.

[10] S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, and A. Roth. Fair learning in markovian environments. In Conference on Fairness, Accountability, and Transparency in Machine Learning, 2016.

[11] S. Yao and B. Huang. Beyond parity: Fairness objectives for collaborative filtering. In Advances in Neural Information Processing Systems, 2017.

[12] K. Lum and J. Johndrow. A statistical framework for fair predictive algorithms. arXiv preprint arXiv: https://arxiv.org/abs/1610.08077, 2016.

[13] I. Zliobaite. On the relation between accuracy and fairness in binary classification. arXiv preprint arXiv: https://arxiv.org/abs/1505.05723, 2015.

[14] T. Calders, F. Kamiran, and M. Pechenizkiy. Building classifiers with independency constraints. In IEEE international conference on Data mining, 2009.

[15] G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems, 2017.

[16] A. Beutel, J. Chen, Z. Zhao, and E. H. Chi. Data decisions and theoretical implications when adversarially learning fair representations. In Conference on Fairness, Accountability, and Transparency in Machine Learning, 2017.

[17] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. In International Conference on Knowledge Discovery and Data Mining, 2015.

[18] A. Agarwal, A. Beygelzimer, M. Dudík, and J. Langford. A reductions approach to fair classification. In Conference on Fairness, Accountability, and Transparency in Machine Learning, 2017.

[19] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach. A reductions approach to fair classification. arXiv preprint arXiv: https://arxiv.org/abs/1803.02453, 2018.

[20] B.Woodworth, S. Gunasekar, M. I. Ohannessian, and N. Srebro. Learning non-discriminatory predictors. In Computational Learning Theory, 2017.

[21] A. K. Menon and R. C. Williamson. The cost of fairness in binary classification. In Conference on Fairness, Accountability and Transparency, 2018.

[22] M. B. Zafar, I. Valera, M. Rodriguez, K. Gummadi, and A. Weller. From parity to preference-based notions of fairness in classification. In Advances in Neural Information Processing Systems, 2017.

[23] Y. Bechavod and K. Ligett. Penalizing unfairness in binary classification. arXiv preprint arXiv: https://arxiv.org/abs/1707.00044v3, 2018.

[24] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classification. In International Conference on Artificial Intelligence and Statistics, 2017.

[25] T. Kamishima, S. Akaho, and J. Sakuma. Fairness-aware learning through regularization approach. In International Conference on Data Mining Workshops, 2011.

[26] M. Kearns, S. Neel, A. Roth, and Z. S. Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. arXiv preprint arXiv: https://arxiv.org/abs/1711.05144, 2017.

[27] A. Pérez-Suay, V. Laparra, G. Mateo-García, J. Muñoz-Marí, L. Gómez-Chova, and G. Camps-Valls. Fair kernel learning. In Machine Learning and Knowledge Discovery in Databases, 2017.

[28] R. Berk, H. Heidari, S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth. A convex framework for fair regression. arXiv preprint arXiv: https://arxiv.org/abs/1706.02409, 2017.

[29] D. Alabi, N. Immorlica, and A. T. Kalai. When optimizing nonlinear objectives is no harder than linear objectives. arXiv preprint arXiv: https://arxiv.org/abs/1804.04503, 2018.

[30] M. Olfat and A. Aswani. Spectral algorithms for computing fair support vector machines. In International Conference on Artificial Intelligence and Statistics, 2018.

[31] J. Adebayo and L. Kagal. Iterative orthogonal feature projection for diagnosing bias in black-box models. In Conference on Fairness, Accountability, and Transparency in Machine Learning, 2016.

[32] F. Kamiran and T. Calders. Classifying without discriminating. In International Conference on Computer, Control and Communication, 2009.

[33] F. Kamiran and T. Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012.

[34] F. Kamiran and T. Calders. Classification with no discrimination by preferential sampling. In Machine Learning Conference, 2010.

[35] S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.

[36] P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.

[37] A. Maurer. A note on the pac bayesian theorem. arXiv preprint https://arxiv.org/abs/cs/0411099, 2004.

[38] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, 2004.

[39] A. J. Smola and B. Schölkopf. Learning with Kernels. MIT Press, 2001.

[40] B. Schölkopf, R. Herbrich, and A. Smola. A generalized representer theorem. In Computational Learning Theory, 2001.

[41] V. N. Vapnik. Statistical learning theory. Wiley New York, 1998.

[42] R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970.

13 Enhancing the Accuracy and Fairness of Human Decision Making

References [1] S. Barocas and A. D. Selbst. Big data´s disparate impact. California Law Review, 2016.

[2] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.

[3] S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. Algorithmic decision making and the cost of fairness. KDD, 2017.

[4] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In ITCS, 2012.

[5] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. In KDD, 2015.

[6] M. Hardt, E. Price, N. Srebro, et al. Equality of opportunity in supervised learning. In NIPS, 2016.

[7] J. Kleinberg, H. Lakkaraju, J. Leskovec, J. Ludwig, and S. Mullainathan. Human decisions and machine predictions. The Quarterly Journal of Economics, 133(1):237–293, 2017.

[8] J. Larson, S. Mattu, L. Kirchner, and J. Angwin. https://github.com/propublica/compas-analysis, 2016.

[9] M. Mastrolilli and G. Stamoulis. Constrained matching problems in bipartite graphs. In ISCO, pages 344–355. Springer, 2012.

[10] C. Mu˜noz, M. Smith, and D. Patil. Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights. Executive Office of the President. The White House., 2016.

[11] I. Osband, D. Russo, and B. Van Roy. (more) efficient reinforcement learning via posterior sampling. In NIPS, pages 3003–3011, 2013.

[12] D. B. West et al. Introduction to graph theory, volume 2. Prentice hall Upper Saddle River, 2001.

[13] B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In WWW, 2017.

[14] B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. Gummadi. Training fair classifiers. AIS- TATS, 2017.

[15] B. Zafar, I. Valera, M. Gomez-Rodriguez, K. Gummadi, and A. Weller. From parity to preference: Learning with cost-effective notions of fairness. In NIPS, 2017.

14 Evolution of collective fairness in hybrid populations of humans and agents

MIT USD

Amazon Mechanical Turk (AMT)

Evolutionary Game Theory (EGT)

Multiplayer version of the Ultimatum Game (MUG)

Ultimatum Game (UG)

References Azaria, A.; Aumann, Y.; and Kraus, S. 2012. Automated strategies for determining rewards for human work. In AAAI-12, 1514–1521. AAAI Press.

Blount, S. 1995. When social outcomes aren’t fair: The effect of causal attributions on preferences. Organ. Behav. Hum. Decis. Process 63(2):131–144.

Brˆanzei, S.; Caragiannis, I.; Kurokawa, D.; and Procaccia, A. D. 2016. An algorithmic framework for strategic fair division. In AAAI-16, 418–424. AAAI Press.

Camerer, C. 2003. Behavioral game theory: Experiments in strategic interaction. Princeton University Press.

Chen, Y.; Lai, J. K.; Parkes, D. C.; and Procaccia, A. D. 2010. Truth, justice, and cake cutting. In AAAI-10, 756– 761. AAAI Press.

Chevaleyre, Y.; Dunne, P. E.; Endriss, U.; Lang, J.; Lemaitre, M.; Maudet, N.; Padget, J.; Phelps, S.; Rodriguez-Aguilar,

J. A.; and Sousa, P. 2006. Issues in multiagent resource allocation. Informatica 3–31.

Chica, M.; Chiong, R.; Kirley, M.; and Ishibuchi, H. 2017. A networked n-player trust game and its evolutionary dynamics. IEEE Trans. Evol. Comput.

Chiong, R., and Kirley, M. 2012. Effects of iterated interactions in multiplayer spatial evolutionary games. IEEE Trans. Evol. Comput. 16(4):537–555.

Correia, F.; Mascarenhas, S.; Prada, R.; Melo, F. S.; and Paiva, A. 2018. Group-based emotions in teams of humans and robots. In HRI-18, 261–269. ACM.

de Jong, S., and Tuyls, K. 2011. Human-inspired computational fairness. Auton. Agents. Multi. Agent. Syst. 22(1):103.

de Jong, S.; Uyttendaele, S.; and Tuyls, K. 2008. Learning to reach agreement in a continuous ultimatum game. J. Artif. Intell. Res. 33:551–574.

de Melo, C. M.; Marsella, S.; and Gratch, J. 2018. Social decisions and fairness change when people’s interests are represented by autonomous agents. Auton. Agents. Multi. Agent. Syst. 32(1):163–187.

Fehr, E., and Fischbacher, U. 2003. The nature of human altruism. Nature 425(6960):785.

Grimm, V.; Feicht, R.; Rau, H.; and Stephan, G. 2017. On the impact of quotas and decision rules in ultimatum collective bargaining. Eur. Econ. Rev. 100:175–192.

G¨uth, W.; Schmittberger, R.; and Schwarze, B. 1982. An experimental analysis of ultimatum bargaining. J. Econ. Behav. Organ. 3(4):367–388.

Han, T.; Pereira, L. M.; Martinez-Vaquero, L. A.; and Lenaerts, T. 2017. Centralized vs. personalized commitments and their influence on cooperation in group interactions. In AAAI-17, 2999–3005. AAAI Press.

Han, T. A.; Pereira, L. M.; and Lenaerts, T. 2017. Evolution of commitment and level of participation in public goods games. Auton. Agents. Multi. Agent. Syst. 31(3):561–583.

Jennings, N. R.; Faratin, P.; Lomuscio, A. R.; Parsons, S.; Wooldridge, M. J.; and Sierra, C. 2001. Automated negotiation: prospects, methods and challenges. Group. Decis. Negot. 10(2):199–215.

Mason, W., and Suri, S. 2012. Conducting behavioral research on amazon’s mechanical turk. Behav. Res. Methods. 44(1):1–23.

Morales, J.; Wooldridge, M.; Rodr´ıguez-Aguilar, J. A.; and L´opez-S´anchez, M. 2018. Off-line synthesis of evolutionarily stable normative systems. Auton. Agents. Multi. Agent. Syst. 1–37.

Nowak, M. A.; Page, K. M.; and Sigmund, K. 2000. Fairness versus reason in the ultimatum game. Science 289(5485):1773–1775.

Osborne, M. J. 2004. An introduction to game theory. Oxford University Press New York.

Page, K. M.; Nowak, M. A.; and Sigmund, K. 2000. The spatial ultimatum game. Proc. R. Soc. Lond. B Biol. Sci. 267(1458):2177–2182.

Paiva, A.; Santos, F. P.; and Santos, F. C. 2018. Engineering pro-sociality with autonomous agents. In AAAI-18, 7994– 7999. AAAI Press.

Parkes, D. C., and Wellman, M. P. 2015. Economic reasoning and artificial intelligence. Science 349(6245):267–272.

Rosenfeld, A., and Kraus, S. 2015. Providing arguments in discussions based on the prediction of human argumentative behavior. In AAAI-15, 1320–1327. AAAI Press.

Sanfey, A. G.; Rilling, J. K.; Aronson, J. A.; Nystrom, L. E.; and Cohen, J. D. 2003. The neural basis of economic decision-making in the ultimatum game. Science 300(5626):1755–1758.

Santos, F. P.; Santos, F. C.; Paiva, A.; and Pacheco, J. M. 2015. Evolutionary dynamics of group fairness. J. Theor. Biol. 378:96–102.

Santos, F. P.; Santos, F. C.; Melo, F. S.; Paiva, A.; and Pacheco, J. M. 2016. Dynamics of fairness in groups of autonomous learning agents. In AAMAS 2016, Workshops Best Papers, 107–126. Springer, Cham.

Santos, F. P.; Pacheco, J. M.; and Santos, F. C. 2018. Social norms of cooperation with costly reputation building. In AAAI-18, 4727–4734. AAAI Press.

Segal-Halevi, E.; Hassidim, A.; and Aumann, Y. 2015. Envy-free cake-cutting in two dimensions. In AAAI-15, 1021–1028. AAAI Press.

Shirado, H., and Christakis, N. A. 2017. Locally noisy autonomous agents improve global human coordination in network experiments. Nature 545(7654):370.

Takesue, H.; Ozawa, A.; and Morikawa, S. 2017. Evolution of favoritism and group fairness in a co-evolving threeperson ultimatum game. EPL 118(4):48002.

Traulsen, A.; Nowak, M. A.; and Pacheco, J. M. 2006. Stochastic dynamics of invasion and fixation. Phys. Rev. E 74(1):011909.

Van Segbroeck, S.; Pacheco, J. M.; Lenaerts, T.; and Santos, F. C. 2012. Emergence of fairness in repeated group interactions. Phys. Rev. Lett. 108(15):158104. Weibull, J. W. 1997. Evolutionary game theory. MIT press.

15 Explaining and Harnessing Adversarial Examples

RBF L-BFGS LSTMs MP-DBM

REFERENCES

Bastien, Fr´ed´eric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian J., Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.

Bergstra, James, Breuleux, Olivier, Bastien, Fr´ed´eric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.

Chalupka, K., Perona, P., and Eberhardt, F. Visual Causal Feature Learning. ArXiv e-prints, https://arxiv.org/abs/1412.2309 December 2014.

Dean, Jeffrey, Corrado, Greg S., Monga, Rajat, Chen, Kai, Devin, Matthieu, Le, Quoc V., Mao, Mark Z., Ranzato, MarcAurelio, Senior, Andrew, Tucker, Paul, Yang, Ke, and Ng, Andrew Y. Large scale distributed deep networks. In NIPS, 2012.

Deng, Jia, Dong, Wei, Socher, Richard, jia Li, Li, Li, Kai, and Fei-fei, Li. Imagenet: A large-scale hierarchical image database. In In CVPR, 2009.

Glorot, Xavier, Bordes, Antoine, and Bengio, Yoshua. Deep sparse rectifier neural networks. In JMLR W&CP: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), April 2011. Published as a conference paper at ICLR 2015

Goodfellow, Ian J., Mirza, Mehdi, Courville, Aaron, and Bengio, Yoshua. Multi-prediction deep Boltzmann machines. In Neural Information Processing Systems, December 2013a.

Goodfellow, Ian J.,Warde-Farley, David, Lamblin, Pascal, Dumoulin, Vincent, Mirza, Mehdi, Pascanu, Razvan, Bergstra, James, Bastien, Fr´ed´eric, and Bengio, Yoshua. Pylearn2: a machine learning research library. arXiv preprint arXiv: https://arxiv.org/abs/1308.4214, 2013b.

Goodfellow, Ian J., Warde-Farley, David, Mirza, Mehdi, Courville, Aaron, and Bengio, Yoshua. Maxout networks. In Dasgupta, Sanjoy and McAllester, David (eds.), International Conference on Machine Learning, pp. 1319–1327, 2013c.

Gu, Shixiang and Rigazio, Luca. Towards deep neural network architectures robust to adversarial examples. In NIPS Workshop on Deep Learning and Representation Learning, 2014.

Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.

Hornik, Kurt, Stinchcombe, Maxwell, and White, Halbert. Multilayer feedforward networks are universal approximators. Neural Networks, 2:359–366, 1989.

Jarrett, Kevin, Kavukcuoglu, Koray, Ranzato, Marc’Aurelio, and LeCun, Yann. What is the best multi-stage architecture for object recognition? In Proc. International Conference on Computer Vision (ICCV’09), pp. 2146–2153. IEEE, 2009.

Krizhevsky, Alex and Hinton, Geoffrey. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.

Nguyen, A., Yosinski, J., and Clune, J. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. ArXiv e-prints https://arxiv.org/abs/1412.1897 , December 2014.

Rust, Nicole, Schwartz, Odelia, Movshon, J. Anthony, and Simoncelli, Eero. Spatiotemporal elements of macaque V1 receptive fields. Neuron, 46(6):945–956, 2005.

Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15 (1):1929–1958, 2014.

Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Dumitru, Vanhoucke, Vincent, and Rabinovich, Andrew. Going deeper with convolutions. Technical report, arXiv preprint arXiv: https://arxiv.org/abs/1409.4842, 2014a.

Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian J., and Fergus, Rob. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014b. URL http://arxiv.org/abs/1312.6199.

16 Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

EU FEDER LSP LSPFS SMOTE USP

semi-supervised learning(SSL) Advancement of Artificial Intelligence (aaai) Missing Completely At Random (MCAR) Spanish Ministry of Economy and Competitiveness (MINECO) SUrvey Network for Deep Imaging Analysis and Learning (SUNDIAL)

References

[Ben-David, Lu, and Pl 2008] Ben-David, S.; Lu, T.; and Pl, D. 2008. D.: Does unlabeled data provably help? worst-case analysis of the sample complexity of semi-supervised learning. In In: 21st Annual Conference on Learning Theory.

[Chapelle, Sch¨olkopf, and Zien 2010] Chapelle, O.; Sch¨olkopf, B.; and Zien, A. 2010. Semi-Supervised Learning. The MIT Press, 1st edition.

[Chapelle, Sindhwani, and Keerthi 2008] Chapelle, O.; Sindhwani, V.; and Keerthi, S. S. 2008. Optimization techniques for semisupervised support vector machines. Journal of Machine Learning Research 9:203–233.

[Chawla et al. 2002] Chawla, N. V.; Bowyer, K.W.; Hall, L. O.; and Kegelmeyer,W. P. 2002. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357.

[Cires¸an et al. 2010] Cires¸an, D. C.; Meier, U.; Gambardella, L. M.; and Schmidhuber, J. 2010. Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12):3207–3220.

[Cortes and Vapnik 1995] Cortes, C., and Vapnik, V. 1995. Support-vector networks. Machine Learning 20(3):273–297.

[Cozman, Cohen, and Cirelo 2003] Cozman, F. G.; Cohen, I.; and Cirelo, M. C. 2003. Semi-supervised learning of mixture models. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 99–106.

[Demsar 2006] Demsar, J. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7:1–30.

[Forman and Cohen 2004] Forman, G., and Cohen, I. 2004. Learning from Little: Comparison of Classifiers Given Little Training. Berlin, Heidelberg: Springer Berlin Heidelberg. 161–172.

[Galar et al. 2012] Galar, M.; Fern´andez, A.; Barrenechea, E.; Bustince, H.; and Herrera, F. 2012. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 42(4):463–484.

[Huang et al. 2006] Huang, J.; Smola, A. J.; Gretton, A.; Borgwardt, K. M.; and Scholkopf, B. 2006. Correcting sample selection bias by unlabeled data. In Proceedings of the 19th International Conference on Neural Information Processing Systems, 601–608. Cambridge, MA, USA: MIT Press.

[Krizhevsky, Sutskever, and Hinton 2012] Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Pereira, F.; Burges, C. J. C.; Bottou, L.; andWeinberger, K. Q., eds., Advances in Neural Information Processing Systems 25. Curran Associates, Inc. 1097–1105.

[Kubat and Matwin 1997] Kubat, M., and Matwin, S. 1997. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the International Conference on Machine Learning, 179–186.

[Li and Wen 2014] Li, D.-C., and Wen, I.-H. 2014. A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing 143:222–230.

[Lichman 2013] Lichman, M. 2013. UCI machine learning repository.

[Niyogi, Girosi, and Poggio 2002] Niyogi, P.; Girosi, F.; and Poggio, T. 2002. Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE 86(11):2196–2209.

[P´erez-Ortiz et al. 2016] P´erez-Ortiz, M.; Guti´errez, P. A.; Tino, P.; and Herv´as-Mart´ınez, C. 2016. Oversampling the minority class in the feature space. IEEE Transactions on Neural Networks and Learning Systems 27(9):1947–1961.

[S´anchez-Monedero et al. 2013] S´anchez-Monedero, J.; Guti´errez, P. A.; P´erez-Ortiz, M.; and Herv´as-Mart´ınez, C. 2013. An nspheres based synthetic data generator for supervised classification. In Advances in Computational Intelligence, 613–621. Berlin, Heidelberg: Springer Berlin Heidelberg.

[Shahshahani and Landgrebe 1994] Shahshahani, B. M., and Landgrebe, D. A. 1994. The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing 32(5):1087–1095.

[Simard, Steinkraus, and Platt 2003] Simard, P. Y.; Steinkraus, D.; and Platt, J. C. 2003. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2, ICDAR ’03, 958–. Washington, DC, USA: IEEE Computer Society.

[Sindhwani and Keerthi 2006] Sindhwani, V., and Keerthi, S. S. 2006. Large scale semi-supervised linear svms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 477–484. ACM.

[Sindhwani, Keerthi, and Chapelle 2006] Sindhwani, V.; Keerthi, S. S.; and Chapelle, O. 2006. Deterministic annealing for semisupervised kernel machines. In Proceedings of the 23rd international conference on Machine learning, 841–848. ACM.

[Singh, Nowak, and Zhu 2008] Singh, A.; Nowak, R. D.; and Zhu, X. 2008. Unlabeled data: Now it helps, now it doesn’t. In Koller, D.; Schuurmans, D.; Bengio, Y.; and Bottou, L., eds., NIPS, 1513– 1520. Curran Associates, Inc.

[Szegedy et al. 2015] Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9.

[Wong et al. 2016] Wong, S. C.; Gatt, A.; Stamatescu, V.; and Mc- Donnell, M. D. 2016. Understanding data augmentation for classification: when to warp? CoRR abs/1609.08764.

[Yang et al. 2011] Yang, J.; Yu, X.; Xie, Z.-Q.; and Zhang, J.-P. 2011. A novel virtual sample generation method based on gaussian distribution. Knowledge-Based Systems 24(6):740 – 748.

[Zhang et al. 2017] Zhang, H.; Ciss´e, M.; Dauphin, Y. N.; and Lopez-Paz, D. 2017. mixup: Beyond empirical risk minimization. CoRR abs/1710.09412.

[Zheng and Skillicorn 2016] Zheng, Q., and Skillicorn, D. 2016. Spectral graph-based semi-supervised learning for imbalanced classes. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 960–967.

[Zhu, Ghahramani, and Lafferty 2003] Zhu, X.; Ghahramani, Z.; and Lafferty, J. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 912–919.

17 Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making

SIG Constant Relative Risk Aversion (CRRA) Equally Distributed Equivalent (EDE)

References Yoram Amiel and Frank A. Cowell. Inequality, welfare and monotonicity. In Inequality, Welfare and Poverty: Theory and Measurement, pages 35{46. Emerald Group Publishing Limited, 2003.

Julia Angwin, Je� Larson, Surya Mattu, and Lauren Kirchner. Machine bias. Propublica, 2016.

Anthony B. Atkinson. On the measurement of inequality. Journal of Economic Theory, 2(3):244{263, 1970. Anna Barry-Jester, Ben Casselman, and Dana Goldstein. The new science of sentencing. The Marshall Project, August 2015.

Toon Calders, Asim Karim, Faisal Kamiran, Wasif Ali, and Xiangliang Zhang. Controlling attribute e�ect in linear regression. In Proceedings of the International Conference on Data Mining, pages 71{80. IEEE, 2013.

Fredrik Carlsson, Dinky Daruvala, and Olof Johansson-Stenman. Are people inequality-averse, or just risk-averse? Economica, 72(287):375{396, 2005.

Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797{806. ACM, 2017.

Frank A. Cowell and Erik Schokkaert. Risk perceptions and distributional judgments. European Economic Review, 45(4-6):941{952, 2001.

Camilo Dagum. On the relationship between income inequality measures and social welfare functions. Journal of Econometrics, 43(1-2):91{102, 1990.

Hugh Dalton. The measurement of the inequality of incomes. The Economic Journal, 30(119):348{361, 1920.

Gerard Debreu. Topological methods in cardinal utility theory. Technical report, Cowles Foundation for Research in Economics, Yale University, 1959.

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the Innovations in Theoretical Computer Science Conference, pages 214{226. ACM, 2012. Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubrama- nian. Certifying and removing disparate impact. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, pages 259{268. ACM, 2015.

Samuel Freeman. Original position. In Edward N. Zalta, editor, The Stanford Encyclopedia of Phi- losophy. Metaphysics Research Lab, Stanford University, winter 2016 edition, 2016.

William M. Gorman. The structure of utility functions. The Review of Economic Studies, 35(4):367{ 390, 1968.

Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Pro- ceedings of Advances in Neural Information Processing Systems, pages 3315{3323, 2016.

John C. Harsanyi. Cardinal utility in welfare economics and in the theory of risk-taking. Journal of Political Economy, 61(5):434{435, 1953.

John C. Harsanyi. Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility. Journal of political economy, 63(4):309{321, 1955.

Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk. In Handbook of the Fundamentals of Financial Decision Making: Part I, pages 99{127. World Scienti c, 2013.

Faisal Kamiran and Toon Calders. Classifying without discriminating. In Proceedings of the 2nd International Conference on Computer, Control and Communication, pages 1{6. IEEE, 2009.

Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regulariza- tion approach. In Proceedings of the International Conference on Data Mining Workshops, pages 643{650. IEEE, 2011.

Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-o�s in the fair deter- mination of risk scores. In In proceedings of the 8th Innovations in Theoretical Computer Science Conference, 2017.

Je� Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. Data and analysis for How we analyzed the COMPAS recidivism algorithm'. https://github.com/propublica/compas-analysis, 2016.

Sam Levin. A beauty contest was judged by AI and the robots didn't like dark skin. The Guardian, 2016.

M. Lichman. UCI machine learning repository: Communities and crime data set. http://archive. ics.uci.edu/ml/datasets/Communities+and+Crime, 2013.

Clair Miller. Can an algorithm hire better than a human? The New York Times, June 25 2015. Retrieved 4/28/2016.

Herv�e Moulin. Fair division and collective welfare. MIT press, 2004.

Kevin Petrasic, Benjamin Saul, James Greig, and Matthew Bornfreund. Algorithms and bias: What lenders need to know. White & Case, 2017.

Arthur Cecil Pigou. Wealth and welfare. Macmillan and Company, limited, 1912.

John Rawls. A theory of justice. Harvard university press, 2009.

Kevin W. S. Roberts. Interpersonal comparability and social choice theory. The Review of Economic Studies, pages 421{439, 1980.

Cynthia Rudin. Predictive policing using machine learning to detect patterns of crime. Wired Maga- zine, August 2013. Retrieved 4/28/2016.

Joseph Schwartz and Christopher Winship. The welfare approach to measuring inequality. Sociological methodology, 11:1{36, 1980.

Amartya Sen. On weights and measures: informational constraints in social welfare analysis. Econo- metrica: Journal of the Econometric Society, pages 1539{1572, 1977.

Till Speicher, Hoda Heidari, Nina Grgic-Hlaca, Krishna P. Gummadi, Adish Singla, Adrian Weller, and Muhammad Bilal Zafar. A uni ed approach to quantifying algorithmic unfairness: Measuring individual and group unfairness via inequality indices. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2018.

Latanya Sweeney. Discrimination in online ad delivery. Queue, 11(3):10, 2013.

Hal R. Varian. Equity, envy, and e�ciency. Journal of economic theory, 9(1):63{91, 1974.

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classi cation without disparate mistreat- ment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171{1180, 2017.

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P. Gummadi. Fairness constraints: Mechanisms for fair classi cation. In Proceedings of the 20th International Conference on Arti cial Intelligence and Statistics, 2017.

Muhammad Bilal Zafar, Isabel Valera, Manuel Rodriguez, Krishna Gummadi, and Adrian Weller. From parity to preference-based notions of fairness in classi cation. In Proceedings of Advances in Neural Information Processing Systems, pages 228{238, 2017.

18 Fairness Through Awareness

AALIM AIDS. CA DMNS ECGs HIV

References [AAL] AALIM. http://www.almaden.ibm.com/cs/projects/aalim/.

[AAN+98] Miklos Ajtai, James Aspnes, Moni Naor, Yuval Rabani, Leonard J. Schulman, and Orli Waarts. Fairness in scheduling. Journal of Algorithms, 29(2):306–357, November 1998.

[ABC+05] Ittai Abraham, Yair Bartal, Hubert T.-H. Chan, Kedar Dhamdhere, Anupam Gupta, Jon M. Kleinberg, Ofer Neiman, and Aleksandrs Slivkins. Metric embeddings with relaxed guarantees. In FOCS, pages 83–100. IEEE, 2005.

[BS06] Nikhil Bansal and Maxim Sviridenko. The santa claus problem. In Proc. 38th STOC, pages 31–40. ACM, 2006.

[Cal05] Catarina Calsamiglia. Decentralizing equality of opportunity and issues concerning the equality of educational opportunity, 2005. Doctoral Dissertation, Yale University.

[CG08] T.-H. Hubert Chan and Anupam Gupta. Approximating TSP on metrics with bounded global growth. In Proc. 19th Symposium on Discrete Algorithms (SODA), pages 690–699. ACM-SIAM, 2008.

[CKNZ04] Chandra Chekuri, Sanjeev Khanna, Joseph Naor, and Leonid Zosin. A linear programming formulation and approximation algorithms for the metric labeling problem. SIAM J. Discrete Math., 18(3):608–625, 2004.

[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Proc. 3rd TCC, pages 265–284. Springer, 2006.

[Dwo06] Cynthia Dwork. Di�erential privacy. In Proc. 33rd ICALP, pages 1–12. Springer, 2006.

[Fei08] Uri Feige. On allocations that maximize fairness. In Proc. 19th Symposium on Discrete Algorithms (SODA), pages 287–293. ACM-SIAM, 2008.

[FT11] Uri Feige and Moshe Tennenholtz. Mechanism design with uncertain inputs (to err is human, to forgive divine). In Proc. 43rd STOC, pages 549–558. ACM, 2011.

[HT10] Moritz Hardt and Kunal Talwar. On the geometry of di�erential privacy. In Proc. 42nd STOC. ACM, 2010.

[Hun05] D. Bradford Hunt. Redlining. Encyclopedia of Chicago, 2005.

[JM09] Carter Jernigan and Behram F.T. Mistree. Gaydar: Facebook friendships expose sexual orientation. First Monday, 14(10), 2009.

[KT02] Jon M. Kleinberg and ´ Eva Tardos. Approximation algorithms for classification problems with pairwise relationships: metric labeling and markov random fields. Journal of the ACM (JACM), 49(5):616–639, 2002.

[MT07] Frank McSherry and Kunal Talwar. Mechanism design via di�erential privacy. In Proc. 48th Foundations of Computer Science (FOCS), pages 94–103. IEEE, 2007.

[Rab93] M. Rabin. Incorporating fairness into game theory and economics. The American Economic Review, 83:1281–1302, 1993.

[Raw01] John Rawls. Justice as Fairness, A Restatement. Belknap Press, 2001.

[SA10] Emily Steel and Julia Angwin. On the web’s cutting edge, anonymity in name only. The Wall Street Journal, 2010.

[SM10] Leslie Scism and Mark Maremont. Insurers test data profiles to identify risky clients. The Wall Street Journal, 2010.

[You95] H. Peyton Young. Equity. Princeton University Press, 1995.

[Zar11] Tal Zarsky. Private communication. 2011.

19 Fairness Through Computationally-Bounded Awareness

COLT ICML ITCS NIPS SIAM

References

[1] Julia Angwin, Jeff Larson, SuryaMattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. ProPublica, 2016.

[2] Andrej Bogdanov and Alon Rosen. Pseudorandomfunctions: Three decades later. In Tutorials on the Foundations of Cryptography, pages 79–158. Springer, 2017.

[3] Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency, pages 77–91, 2018.

[4] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 2017.

[5] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. KDD, 2017.

[6] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard S. Zemel. Fairness through awareness. In Innovations in Theoretical Computer Science (ITCS), pages 214– 226, 2012.

[7] Vitaly Feldman. Distribution-specific agnostic boosting. In Proceedings of the First Sympo- sium on Innovations in Computer Science’10, 2010.

[8] Vitaly Feldman, Venkatesan Guruswami, Prasad Raghavendra, and Yi Wu. Agnostic learning of monomials by halfspaces is hard. SIAM Journal on Computing, 41(6):1558–1590, 2012.

[9] Avi Feller, Emma Pierson, Sam Corbett-Davies, and Sharad Goel. A computer program used for bail and sentencing decisions was labeled biased against blacks. it’s actually not that clear. The Washington Post, 2016.

[10] Stephen Gillen, Christopher Jung, Michael J. Kearns, and Aaron Roth. Online learning with an unknown fairness metric. arXiv preprint arXiv: https://arxiv.org/abs/1802.06936, 2018.

[11] Oded Goldreich, Shafi Goldwasser, and Silvio Micali. How to construct random functions. In Foundations of Computer Science, 1984. 25th Annual Symposium on, pages 464–479. IEEE, 1984.

[12] Parikshit Gopalan, Adam Tauman Kalai, and Adam R Klivans. Agnostically learning decision trees. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 527–536. ACM, 2008.

[13] Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, pages 3315–3323, 2016.

[14] Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. Calibration for the (computationally-identifiable)masses. ICML, 2018.

[15] Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983–1006, 1998.

[16] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. ICML, 2018.

[17] Michael J. Kearns, Robert E. Schapire, and LindaM. Sellie. Toward efficient agnostic learning. Machine Learning, 17(2-3):115–141, 1994.

[18] Michael P. Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box postprocessing for fairness in classification. arXiv preprint arXiv: https://arxiv.org/abs/1805.12317, 2018.

[19] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. ITCS, 2017.

[20] Yurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical pro- gramming, 120(1):221–259, 2009.

[21] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q. Weinberger. On fairness and calibration. NIPS, 2017.

[22] Guy N. Rothblum and Gal Yona. Probably approximately metric-fair learning. ICML, 2018.

[23] Leslie G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.

[24] KavehWaddell. How algorithms can bring down minorities’ credit scores. The Atlantic, 2016.

[25] Blake Woodworth, Suriya Gunasekar, Mesrob I Ohannessian, and Nathan Srebro. Learning non-discriminatory predictors. COLT, 2017.

20 Fairness Without Demographics in Repeated Loss Minimization

distributionally robust optimization (DRO) empirical risk minimization (ERM)

References Altham, J. J. Rawls’ difference principle. Philosophy, 48: 75–78, 1973.

Amodei, D. et al. Deep speech 2 end to end speech recognition in English and mandarin. In International Conference on Machine Learning (ICML), pp. 173–182, 2016.

Barocas, S. and Selbst, A. D. Big data’s disparate impact. 104 California Law Review, 3:671–732, 2016.

Ben-Tal, A., den Hertog, D.,Waegenaere, A. D., Melenberg, B., and Rennen, G. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.

Blodgett, S. L., Green, L., and O’Connor, B. Demographic dialectal variation in social media: A case study of African-American English. In Empirical Methods in Natural Language Processing (EMNLP), pp. 1119–1130, 2016.

Chouldechova, A. A study of bias in recidivism prediciton instruments. Big Data, pp. 153–163, 2017.

Duchi, J. C. and Namkoong, H. Variance-based regularization with convex objectives. arXiv: https://arxiv.org/abs/1610.02581 [stat.ML], 2016.

Duchi, J. C. and Namkoong, H. Distributionally robust stochastic optimization: Minimax rates and asymptotics. Working Paper, 2018.

Duchi, J. C., Glynn, P. W., and Namkoong, H. Statistics of robust optimization: A generalized empirical likelihood approach. arXiv:1610.03425 [stat.ML], 2016. URL https://arxiv.org/abs/1610.03425.

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. Fairness through awareness. In Innovations in Theoretical Computer Science (ITCS), pp. 214–226, 2012.

Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., and Venkatasubramanian, S. Certifying and removing disparate impact. In International Conference on Knowledge Discovery and Data Mining (KDD), pp. 259–268, 2015.

Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., and Walther, A. Predictably unequal? the effects of machine learning on credit markets. Technical report, CEPR Discussion Papers, 2017.

Grother, P. J., Quinn, G. W., and Phillips, P. J. Report on the evaluation of 2d still-image face recognition algorithms. Technical report, NIST, 2011.

Hardt, M., Price, E., and Srebo, N. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems (NIPS), pp. 3315–3323, 2016.

Harsanyi, J. C. Can the maximin principle serve as a basis for morality? a critique of john rawls’s theory. The American Political Science Review, 69:594–606, 1975.

He´bert-Johnson, U´ ., Kim, M. P., Reingold, O., and Rothblum, G. N. Calibration for the (computationallyidentifiable) masses. arXiv preprint arXiv: https://arxiv.org/abs/1711.08513, 2017. Hovy, D. and Sgaard, A. Tagging performance correlates with age. In Association for Computational Linguistics (ACL), pp. 483–488, 2015. Hu, W., Niu, G., Sato, I., and Sugiyama, M. Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning (ICML), 2018. Jabbari, S., Joseph, M., Kearns, M., Morgenstern, J., and Roth, A. Fairness in reinforcement learning. In International Conference on Machine Learning (ICML), pp. 1617–1626, 2017.

Joseph, M., Kearns, M., Morgenstern, J., Neel, S., and Roth, A. Rawlsian fairness for machine learning. In FATML, 2016.

Jurgens, D., Tsvetkov, Y., and Jurafsky, D. Incorporating dialectal variability for socially equitable language identification. In Association for Computational Linguistics (ACL), pp. 51–57, 2017.

Kearns, M., Neel, S., Roth, A., and Wu, Z. S. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. arXiv preprint arXiv: https://arxiv.org/abs/1711.05144, 2018.

Kleinberg, J., Mullainathan, S., and Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In Innovations in Theoretical Computer Science (ITCS), 2017.

Lam, H. and Zhou, E. Quantifying input uncertainty in stochastic optimization. In Proceedings of the 2015 Winter Simulation Conference. IEEE, 2015.

Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., and Hardt, M. Delayed impact of fair machine learning. arXiv preprint arXiv: https://arxiv.org/abs/1803.04383, 2018.

Luo, A. C. Regularity and complexity in dynamical systems. Springer, 2012.

Mueller, D. C., Tollison, R. D., and Willet, T. D. The utilitarian contract: A generalization of rawls’ theory of justice. Theory and Decision, 4:345–367, 1974.

Fairness Without Demographics in Repeated Loss Minimization Namkoong, H. and Duchi, J. C. Stochastic gradient methods for distributionally robust optimization with f- divergences. In Advances in Neural Information Processing Systems 29, 2016.

Namkoong, H. and Duchi, J. C. Variance regularization with convex objectives. In Advances in Neural Information Processing Systems 30, 2017.

Rawls, J. Justice as fairness: a restatement. Harvard University Press, 2001.

Rawls, J. A theory of justice: Revised edition. Harvard University Press, 2009.

Sapiezynski, P., Kassarnig, V., Wilson, C., Lehmann, S., and Mislove, A. Academic performance prediction in a gender-imbalanced environment. In FATREC, volume 1, pp. 48–51, 2017.

Tatman, R. Gender and dialect bias in youtubes automatic captions. In Workshop on Ethics in Natural Langauge Processing, volume 1, pp. 53–59, 2017.

Woodworth, B., Gunasekar, S., Ohannessian, M. I., and Srebro, N. Learning non-discriminatory predictors. In Conference on Learning Theory (COLT), pp. 1920–1953, 2017.

#21 Focal Loss for Dense Object Detection

COCO HOG SSD YOLO

cross entropy (CE) Focal Loss (FL) Feature Pyramid Network (FPN) online hard example mining (OHEM)

Facebook AI Research (FAIR)

References [1] S. Bell, C. L. Zitnick, K. Bala, and R. Girshick. Insideoutside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR, 2016. 6

[2] S. R. Bulo, G. Neuhold, and P. Kontschieder. Loss maxpooling for semantic image segmentation. In CVPR, 2017. 3

[3] J. Dai, Y. Li, K. He, and J. Sun. R-FCN: Object detection via region-based fully convolutional networks. In NIPS, 2016. 1

[4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 2

[5] P. Doll´ar, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In BMVC, 2009. 2, 3

[6] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In CVPR, 2014. 2

[7] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010. 2

[8] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Cascade object detection with deformable part models. In CVPR, 2010. 2, 3

[9] C.-Y. Fu,W. Liu, A. Ranga, A. Tyagi, and A. C. Berg. DSSD: Deconvolutional single shot detector. arXiv: https://arxiv.org/abs/1701.06659, 2016. 1, 2, 8

[10] R. Girshick. Fast R-CNN. In ICCV, 2015. 1, 2, 4, 6, 8

[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. 1, 2, 5

[12] R. Girshick, I. Radosavovic, G. Gkioxari, P. Doll´ar, and K. He. Detectron. https://github.com/ facebookresearch/detectron, 2018. 8

[13] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer series in statistics Springer, Berlin, 2008. 3, 7

[14] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick. Mask RCNN. In ICCV, 2017. 1, 2, 4

[15] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV. 2014. 2

[16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 2, 4, 5, 6, 8

[17] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR, 2017. 2, 8

[18] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012. 2

[19] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989. 2

[20] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In CVPR, 2017. 1, 2, 4, 5, 6, 8

[21] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014. 1, 6

[22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. SSD: Single shot multibox detector. In ECCV, 2016. 1, 2, 3, 6, 7, 8

[23] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015. 4

[24] P. O. Pinheiro, R. Collobert, and P. Dollar. Learning to segment object candidates. In NIPS, 2015. 2, 4

[25] P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Doll´ar. Learning to refine object segments. In ECCV, 2016. 2

[26] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In CVPR, 2016. 1, 2

[27] J. Redmon and A. Farhadi. YOLO9000: Better, faster, stronger. In CVPR, 2017. 1, 2, 8

[28] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015. 1, 2, 4, 5, 8

[29] H. Rowley, S. Baluja, and T. Kanade. Human face detection in visual scenes. Technical Report CMU-CS-95-158R, Carnegie Mellon University, 1995. 2

[30] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014. 2

[31] A. Shrivastava, A. Gupta, and R. Girshick. Training regionbased object detectors with online hard example mining. In CVPR, 2016. 2, 3, 6, 7

[32] A. Shrivastava, R. Sukthankar, J. Malik, and A. Gupta. Beyond skip connections: Top-down modulation for object detection. arXiv: https://arxiv.org/abs/1612.06851, 2016. 2, 8

[33] K.-K. Sung and T. Poggio. Learning and Example Selection for Object and Pattern Detection. In MIT A.I. Memo No. 1521, 1994. 2, 3

[34] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI Conference on Artificial Intelligence, 2017. 8

[35] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. IJCV, 2013. 2, 4

[36] R. Vaillant, C. Monrocq, and Y. LeCun. Original approach for the localisation of objects in images. IEE Proc. on Vision, Image, and Signal Processing, 1994. 2

[37] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001. 2, 3

[38] S. Xie, R. Girshick, P. Doll´ar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In CVPR, 2017. 8

[39] C. L. Zitnick and P. Doll´ar. Edge boxes: Locating object proposals from edges. In ECCV, 2014. 2

#22 Generative Adversarial Minority Oversampling

Conditional Transient Mapping Unit (cTMU) deep oversampling framework (DOS) Generative Adversarial Minority Oversampling (GAMO) Generative adversarial networks (GANs)

References

[1] S. Ando and C. Y. Huang. Deep over-sampling framework for classifying imbalanced data. In Machine Learning and Knowledge Discovery in Databases, pages 770–785. Springer International Publishing, 2017. 2

[2] S. Barua, M. M. Islam, X. Yao, and K. Murase. Mwmote– majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering, 26(2):405–425, 2014. 2

[3] D. Berthelot, T. Schumm, and L. Metz. Began: boundary equilibrium generative adversarial networks. arXiv preprint arXiv: https://arxiv.org/abs/1703.10717, 2017. 8

[4] P. Branco, L. Torgo, and R. P. Ribeiro. A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR), 49(2):31, 2016. 1, 5

[5] M. Buda, A. Maki, and M. A. Mazurowski. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106:249–259, 2018. 1

[6] S. R. Bulo, G. Neuhold, and P. Kontschieder. Loss maxpooling for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2126–2135, 2017. 1

[7] C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap. Safe-level-smote: Safe-level-synthetic minority oversampling technique for handling the class imbalanced problem. In Advances in Knowledge Discovery and Data Mining, pages 475–482, 2009. 2

[8] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321– 357, 2002. 2

[9] N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. Smoteboost: Improving prediction of the minority class in boosting. In Knowledge Discovery in Databases: PKDD 2003, pages 107–119, 2003. 2

[10] Y.-A. Chung, H.-T. Lin, and S.-W. Yang. Cost-aware pretraining for multiclass cost-sensitive deep learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, pages 1411–1417. AAAI Press, 2016. 1

[11] S. Das, S. Datta, and B. B. Chaudhuri. Handling data irregularities in classification: Foundations, trends, and future challenges. Pattern Recognition, 81:674–693, 2018. 1

[12] Q. Dong, S. Gong, and X. Zhu. Imbalanced deep learning by minority class incremental rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. 1

[13] G. Douzas and F. Bacao. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with applications, 91:464–471, 2018. 2

[14] A. Fern´andez, S. Garcia, F. Herrera, and N. V. Chawla. Smote for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research, 61:863–905, 2018. 2

[15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014. 2

[16] S. Gurumurthy, R. Kiran Sarvadevabhatla, and R. Venkatesh Babu. Deligan: Generative adversarial networks for diverse and limited data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 166–174, 2017. 9

[17] H. Han, W.-Y. Wang, and B.-H. Mao. Borderline-smote: A new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing, pages 878–887, 2005. 2

[18] H. He, Y. Bai, E. A. Garcia, and S. Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Neural Networks, pages 1322–1328, 2008. 2

[19] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9):1263–1284, 2009. 1

[20] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems 30, pages 6626–6637, 2017.

[21] C. Huang, Y. Li, C. Change Loy, and X. Tang. Learning deep representation for imbalanced classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5375–5384, 2016. 1, 5

[22] S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 29(8):3573–3587, 2018. 1

[23] D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv: https://arxiv.org/abs/1312.6114, 2013. 8

[24] B. Krawczyk. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4):221–232, 2016. 1 [25] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto, 2009. 5

[26] M. Kubat, S. Matwin, et al. Addressing the curse of imbalanced training sets: one-sided selection. In Icml, volume 97, pages 179–186, 1997. 5

[27] A. Kumar, P. Sattigeri, and T. Fletcher. Semi-supervised learning with gans: Manifold invariance with improved inference. In Advances in Neural Information Processing Systems, pages 5534–5544, 2017. 2

[28] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. 5

[29] M. Lin, K. Tang, and X. Yao. Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Transactions on Neural Networks and Learning Systems, 24(4):647–660, 2013. 2

[30] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollr. Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2999–3007. IEEE, 2017. 1

[31] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z.Wang, and S. Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2794–2802, 2017. 5, 7

[32] M. A. Mazurowski, P. A. Habas, J. M. Zurada, J. Y. Lo, J. A. Baker, and G. D. Tourassi. Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural networks, 21(2-3):427–436, 2008. 1

[33] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv: https://arxiv.org/abs/1411.1784, 2014. 2

[34] Y. Netzer, T.Wang, A. Coates, A. Bissacco, B.Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, page 5, 2011. 5

[35] A. Odena, C. Olah, and J. Shlens. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2642–2651. JMLR. org, 2017. 2

[36] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv: https://arxiv.org/abs/1511.06434, 2015. 2

[37] D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv: https://arxiv.org/abs/1401.4082, 2014. 8

[38] T. Salimans, I. Goodfellow,W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In Advances in neural information processing systems, pages 2234–2242, 2016. 2

[39] S. Santurkar, L. Schmidt, and A. Madry. A classificationbased study of covariate shift in gan distributions. In International Conference on Machine Learning, pages 4487–4496, 2018. 2

[40] M. Sokolova and G. Lapalme. A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427 – 437, 2009. 5

[41] J. T. Springenberg. Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv: https://arxiv.org/abs/1511.06390, 2015. 2

[42] A. Srivastava, L. Valkov, C. Russell, M. U. Gutmann, and C. Sutton. Veegan: Reducing mode collapse in gans using implicit variational learning. In Advances in Neural Information Processing Systems 30, pages 3308–3318, 2017. 2

[43] S. Wang, W. Liu, J. Wu, L. Cao, Q. Meng, and P. Kennedy. Training deep neural networks on imbalanced data sets. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 4368–4374. IEEE, 2016. 1

[44] Y.-X.Wang, D. Ramanan, and M. Hebert. Learning to model the tail. In Advances in Neural Information Processing Systems, pages 7029–7039, 2017. 2, 5

[45] C. Wu, L. Herranz, X. Liu, J. van de Weijer, B. Raducanu, et al. Memory replay gans: Learning to generate new categories without forgetting. In Advances in Neural Information Processing Systems, pages 5966–5976, 2018. 9

[46] H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv: https://arxiv.org/abs/1708.07747, 2017. 5

[47] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pages 3485–3492, 2010. 5

[48] S. Xie and Z. Tu. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015. 1

[49] S. Xie and Z. Tu. Holistically-nested edge detection. International Journal of Computer Vision, 125(1-3):3–18, 2017.

[50] Y. Yan, M. Chen, M. Shyu, and S. Chen. Deep learning for imbalanced multimedia data classification. In 2015 IEEE International Symposium on Multimedia (ISM), pages 483– 488. IEEE, 2015. 1

[51] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv: https://arxiv.org/abs/1506.03365, 2015. 5

23 Group Fairness for Indivisible Goods Allocation

GF1A
GF1B

References Agarwal, A.; Beygelzimer, A.; Dud´ık, M.; Langford, J.; and Wallach, H. 2018. A reductions approach to fair classification. In Proc. of the 35th International Conference on Machine Learning (ICML).

Aleksandrov, M., and Walsh, T. 2018. Group envy freeness and group pareto efficiency in fair division with indivisible items. In Proc. of the 41st German Conference on AI (KI).

Babaioff, M.; Nisan, N.; and Talgam-Cohen, I. 2017. Competitive equilibria with indivisible goods and generic budgets. CoRR arXiv: https://arxiv.org/abs/1703.08150.

Barman, S.; Biswas, A.; Krishnamurthy, S.; and Narahari, Y. 2018. Groupwise maximin fair allocation of indivisible goods. In Proc. of the 32nd AAAI Conference on Artificial Intelligence (AAAI).

Barman, S.; Krishnamurthy, S. K.; and Vaish, R. 2018. Finding fair and efficient allocations. In Proc. of the 19th ACM Conference on Economics and Computation (EC).

Berliant, M.; Thomson, W.; and Dunz, K. 1992. On the fair division of a heterogeneous commodity. Journal of Mathematical Economics 21(3):201–216.

Budish, E. 2011. The combinatorial assignment problem: Approximate competitive equilibrium from equal incomes. Journal of Political Economy 119(6):1061–1103.

Calders, T., and Verwer, S. 2010. Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery 21(2):277–292.

Caragiannis, I.; Kurokawa, D.; Moulin, H.; Procaccia, A. D.; Shah, N.; and Wang, J. 2016. The unreasonable fairness of maximum Nash welfare. In Proc. of the 17th ACM Conference on Economics and Computation (EC).

Conitzer, V.; Freeman, R.; and Shah, N. 2017. Fair public decision making. In Proc. of the 18th ACM Conference on Economics and Computation (EC).

Fain, B.; Goel, A.; and Munagala, K. 2016. The core of the participatory budgeting problem. In Proc. of the 12th Conference on Web and Internet Economics (WINE).

Fain, B.; Munagala, K.; and Shah, N. 2018. Fair allocation of indivisible public goods. In Proc. of the 19th ACM Conference on Economics and Computation (EC).

Foley, D. 1967. Resource allocation and the public sector. Yale Economics Essays 7:45–98.

Hardt, M.; Price, E.; and Srebro, N. 2016. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems 29.

Hebert-Johnson, U.; Kim, M.; Reingold, O.; and Rothblum, G. 2018. Multicalibration: Calibration for the (computationally-identifiable) masses. In Proc. of the 35th International Conference on Machine Learning (ICML).

Husseinov, F. 2011. A theory of a heterogeneous divisible commodity exchange economy. Journal of Mathematical Economics 47(1):54–59.

Kamiran, F., and Calders, T. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33(1):1–33.

Kearns, M.; Neel, S.; Roth, A.; and Wu, Z. S. 2018. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In Proc. of the 35th International Conference on Machine Learning (ICML).

Kleinberg, J.; Mullainathan, S.; and Raghavan, M. 2017. Inherent trade-offs in the fair determination of risk scores. In Proc. of the 8th Innovations in Theoretical Computer Science Conference (ITCS).

Lipton, R. J.; Markakis, E.; Mossel, E.; and Saberi, A. 2004. On approximately fair allocations of indivisible goods. In Proc. of the 6th ACM Conf. on Electronic Commerce (EC).

Manurangsi, P., and Suksompong, W. 2017. Asymptotic existence of fair divisions for groups. Mathematical Social Sciences 89:100–108.

Segal-Halevi, E., and Nitzan, S. 2015. Fair cake-cutting among families. CoRR arXiv: https://arxiv.org/abs/1510.03903.

Segal-Halevi, E., and Suksompong, W. 2018. Democratic fair allocation of indivisible goods. In Proc. of the 27th International Joint Conf. on Artificial Intelligence (IJCAI).

Segal-Halevi, E., and Sziklai, B. 2018. Monotonicity and competitive equilibrium in cake-cutting. Economic Theory 1–39.

Steinhaus, H. 1948. The problem of fair division. Econometrica 16:101–104.

Suksompong, W. 2018. Approximate maximin shares for groups of agents. Mathematical Social Sciences 92:40–47.

Todo, T.; Li, R.; Hu, X.; Mouri, T.; Iwasaki, A.; and Yokoo, M. 2011. Generalizing envy-freeness toward group of agents. In Proc. of the 22nd International Joint Conf. on Artificial Intelligence (IJCAI).

Zhang, Z., and Neill, D. B. 2016. Identifying significant predictive bias in classifiers. CoRR arXiv: https://arxiv.org/abs/1611.08292.

24 Imperceptible Adversarial Attacks on Tabular Data

FGSM area under the ROC curve (AUC) Mutual Information-based Fair Representations (MIFR)

References Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. 2015. URL http://arxiv.org/abs/1412.6572.

Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndi´c, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Železný, editors, Machine Learning and Knowledge Discovery in Databases, 2013.

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. ICLR 2017, 2017.

Nicholas Carlini and David A. Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. 10th ACM Workshop on Artificial Intelligence and Security, 2017.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.

Danny Karmon, Daniel Zoran, and Yoav Goldberg. Lavan: Localized and visible adversarial noise. CoRR, 2018.

Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. CoRR, 2017.

Mahmood Sharif, Lujo Bauer, and Michael K. Reiter. On the suitability of lp-norms for creating and preventing adversarial examples. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018.

Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.

I-Cheng Yeh and Che-hui Lien. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 2009.

Wendy Kan. Lending club loan data, version 1, 2019. URL https://www.kaggle.com/wendykan/ lending-club-loan-data.

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. IEEE conference on computer vision and pattern recognition, 2015.

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. Berkay Celik, and A. Swami. Practical Black-Box Attacks against Machine Learning. 2017 ACM on Asia conference on computer and communications security, 2016.

Y. Liu, X. Chen, C. Liu, and D. Song. Delving into Transferable Adversarial Examples and Black-box Attacks. CoRR, 2016.

25 Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?

NSF ONR AFOSR

Bibliography Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. Deep variational information bottleneck. arXiv preprint arXiv: https://arxiv.org/abs/1612.00410, December 2016.

Alexander A Alemi, Ben Poole, Ian Fischer, Joshua V Dillon, Rif A Saurous, and Kevin Murphy. Fixing a broken ELBO. arXiv preprint arXiv: https://arxiv.org/abs/1711.00464, November 2017.

T Calders, F Kamiran, and M Pechenizkiy. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13–18, December 2009. doi: 10.1109/ICDMW.2009.83.

Flavio P Calmon, Dennis Wei, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized data Pre-Processing for discrimination prevention. arXiv preprint arXiv: https://arxiv.org/abs/1704.03354, April 2017.

Xi Chen, Diederik P Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. Variational lossy autoencoder. arXiv preprint arXiv: https://arxiv.org/abs/1611.02731, November 2016.

Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, June 2017. ISSN 2167-647X, 2167-6461. doi: 10.1089/big.2016. 0047. Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. arXiv preprint arXiv: https://arxiv.org/abs/1605.08803, May 2016.

Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint arXiv: https://arxiv.org/abs/1511.05897, November 2015.

Stephan Eissman, Daniel Levy, Rui Shu, Stefan Bartzsch, and Stefano Ermon. Bayesian optimization and attribute adjustment. In Proc. 34th Conference on Uncertainty in Artificial Intelligence, 2018.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.

Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J Smola. A kernel method for the two-sample-problem. In Advances in neural information processing systems, pages 513– 520, 2007.

Aditya Grover, , and Stefano Ermon. Uncertainty autoencoders: Learning compressed representations via variational information maximization. In AISTATS, 2019.

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5769–5779, 2017.

Moritz Hardt, Eric Price, Nati Srebro, and Others. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016.

Diederik P Kingma and Max Welling. Auto- Encoding variational bayes. arXiv preprint arXiv: https://arxiv.org/abs/1312.6114v10, December 2013.

Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and MaxWelling. Improving variational inference with inverse autoregressive flow. arXiv preprint arXiv: https://arxiv.org/abs/1606.04934, June 2016.

Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent Trade-Offs in the fair determination of risk scores. arXiv preprint arXiv: https://arxiv.org/abs/1609.05807, September 2016.

Junpei Komiyama, Akiko Takeda, Junya Honda, and Hajime Shimao. Nonconvex optimization for regression with fairness constraints. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2742–2751, Stockholmsmässan, Stockholm Sweden, 2018. PMLR.

Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. arXiv preprint arXiv: https://arxiv.org/abs/1511.00830, November 2015.

Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep Latent-Variable models. In I Guyon, U V Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, and R Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6446–6456. Curran Associates, Inc., 2017.

David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. arXiv preprint arXiv: https://arxiv.org/abs/1802.06309, February 2018.

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv: https://arxiv.org/abs/1505.05770, May 2015.

Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv: https://arxiv.org/abs/1601.06759, January 2016.

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Learning fair classifiers. arXiv preprint arXiv: https://arxiv.org/abs/1507. 05259, 2015.

Learning Controllable Fair Representations Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, February 2013.

Shengjia Zhao, Jiaming Song, and Stefano Ermon. InfoVAE: Information maximizing variational autoencoders. arXiv preprint arXiv: https://arxiv.org/abs/1706.02262, June 2017a.

Shengjia Zhao, Jiaming Song, and Stefano Ermon. Towards deeper understanding of variational autoencoding models. arXiv preprint arXiv: https://arxiv.org/abs/1702.08658, February 2017b.

Shengjia Zhao, Jiaming Song, and Stefano Ermon. A lagrangian perspective to latent variable generative models. Conference on Uncertainty in Artificial Intelligence, 2018.

Indre Zliobaite. On the relation between accuracy and fairness in binary classification. arXiv preprint arXiv: https://arxiv.org/abs/1505.05723, May 2015.

26 Large-Margin Softmax Loss for Convolutional Neural Networks

CASIA LFW PCA

References Asthana, Akshay, Zafeiriou, Stefanos, Cheng, Shiyang, and Pantic, Maja. Incremental face alignment in the wild. In CVPR, 2014.

Ding, Changxing and Tao, Dacheng. Robust face recognition via multimodal deep face representation. IEEE TMM, 17(11): 2049–2058, 2015.

Goodfellow, Ian J,Warde-Farley, David, Mirza, Mehdi, Courville, Aaron, and Bengio, Yoshua. Maxout networks. arXiv preprint arXiv: https://arxiv.org/abs/1302.4389, 2013.

Hadsell, Raia, Chopra, Sumit, and LeCun, Yann. Dimensionality reduction by learning an invariant mapping. In CVPR, 2006.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. arXiv preprint arXiv: https://arxiv.org/abs/1512.03385, 2015a.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015b.

Hinton, Geoffrey E, Srivastava, Nitish, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv: https://arxiv.org/abs/1207.0580, 2012.

Huang, Gary B, Ramesh, Manu, Berg, Tamara, and Learned- Miller, Erik. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report, 2007.

Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

Jarrett, Kevin, Kavukcuoglu, Koray, Ranzato, Marc’Aurelio, and LeCun, Yann. What is the best multi-stage architecture for object recognition? In ICCV, 2009.

Jia, Yangqing, Shelhamer, Evan, Donahue, Jeff, Karayev, Sergey, Long, Jonathan, Girshick, Ross, Guadarrama, Sergio, and Darrell, Trevor. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv: https://arxiv.org/abs/1408.5093, 2014.

Krizhevsky, Alex. Learning multiple layers of features from tiny images. Technical Report, 2009.

Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

LeCun, Yann, Cortes, Corinna, and Burges, Christopher JC. The mnist database of handwritten digits, 1998.

Lee, Chen-Yu, Xie, Saining, Gallagher, Patrick, Zhang, Zhengyou, and Tu, Zhuowen. Deeply-supervised nets. In AISTATS, 2015.

Lee, Chen-Yu, Gallagher, Patrick W, and Tu, Zhuowen. Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In AISTATS, 2016.

Liang, Ming and Hu, Xiaolin. Recurrent convolutional neural network for object recognition. In CVPR, 2015.

Lin, Min, Chen, Qiang, and Yan, Shuicheng. Network in network. In ICLR, 2014.

Nair, Vinod and Hinton, Geoffrey E. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.

Parkhi, Omkar M, Vedaldi, Andrea, and Zisserman, Andrew. Deep face recognition. In BMVC, 2015.

Romero, Adriana, Ballas, Nicolas, Kahou, Samira Ebrahimi, Chassang, Antoine, Gatta, Carlo, and Bengio, Yoshua. Fitnets: Hints for thin deep nets. In ICLR, 2015.

Russakovsky, Olga, Deng, Jia, Su, Hao, Krause, Jonathan, Satheesh, Sanjeev, Ma, Sean, Huang, Zhiheng, Karpathy, Andrej, Khosla, Aditya, Bernstein, Michael, et al. Imagenet large scale visual recognition challenge. IJCV, pp. 1–42, 2014.

Schroff, Florian, Kalenichenko, Dmitry, and Philbin, James. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.

Sermanet, Pierre, Eigen, David, Zhang, Xiang, Mathieu, Micha¨el, Fergus, Rob, and LeCun, Yann. Overfeat: Integrated recognition, localization and detection using convolutional networks. ICLR, 2014.

Simonyan, Karen and Zisserman, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: https://arxiv.org/abs/1409.1556, 2014.

Springenberg, Jost Tobias, Dosovitskiy, Alexey, Brox, Thomas, and Riedmiller, Martin. Striving for simplicity. In ICLR, 2015.

Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1):1929– 1958, 2014.

Stollenga, Marijn F, Masci, Jonathan, Gomez, Faustino, and Schmidhuber, J¨urgen. Deep networks with internal selective attention through feedback connections. In NIPS, 2014.

Sun, Yi, Chen, Yuheng, Wang, Xiaogang, and Tang, Xiaoou. Deep learning face representation by joint identificationverification. In NIPS, 2014.

Sun, Yi, Wang, Xiaogang, and Tang, Xiaoou. Deeply learned face representations are sparse, selective, and robust. In CVPR, 2015.

Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Dumitru, Vanhoucke, Vincent, and Rabinovich, Andrew. Going deeper with convolutions. In CVPR, 2015.

Large-Margin Softmax Loss for Convolutional Neural Networks Taigman, Yaniv, Yang, Ming, Ranzato, Marc’Aurelio, and Wolf, Lars. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014.

Wan, Li, Zeiler, Matthew, Zhang, Sixin, Cun, Yann L, and Fergus, Rob. Regularization of neural networks using dropconnect. In ICML, 2013.

Yi, Dong, Lei, Zhen, Liao, Shengcai, and Li, Stan Z. Learning face representation from scratch. arXiv preprint arXiv: https://arxiv.org/abs/1411.7923, 2014.

Zeiler, Matthew D and Fergus, Rob. Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv: https://arxiv.org/abs/1301.3557, 2013.

27 Learning Adversarially Fair and Transferable Representations

MLP LAFTR

maximum mean discrepancy (MMD) Natural Sciences and Engineering Research Council of Canada (NSERC). primary condition group (PCG)

References Bechavod, Y. and Ligett, K. Learning fair classifiers: A regularization-inspired approach. arXiv preprint arXiv: https://arxiv.org/abs/1707.00044, 2017.

Beutel, A., Chen, J., Zhao, Z., and Chi, E. H. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv: https://arxiv.org/abs/1707.00075, 2017.

Blitzer, J., McDonald, R., and Pereira, F. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing, pp. 120–128. Association for Computational Linguistics, 2006.

Calmon, F., Wei, D., Vinzamuri, B., Ramamurthy, K. N., and Varshney, K. R. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems, pp. 3995–4004, 2017.

Chouldechova, A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, 2017.

Cover, T. M. and Thomas, J. A. Elements of information theory. John Wiley & Sons, 2012.

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pp. 214–226. ACM, 2012.

Edwards, H. and Storkey, A. Censoring representations with an adversary. In International Conference on Learning Representations, 2016.

Learning Adversarially Fair and Transferable Representations Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.

Gretton, A., Borgwardt, K. M., Rasch, M., Sch¨olkopf, B., and Smola, A. J. A kernel method for the two-sampleproblem. In Advances in neural information processing systems, pp. 513–520, 2007.

Gutmann, M. and Hyv¨arinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304, 2010.

Hajian, S., Domingo-Ferrer, J., Monreale, A., Pedreschi, D., and Giannotti, F. Discrimination-and privacy-aware patterns. Data Mining and Knowledge Discovery, 29(6): 1733–1782, 2015.

Hardt, M., Price, E., Srebro, N., et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pp. 3315–3323, 2016.

H´ebert-Johnson, U., Kim, M. P., Reingold, O., and Rothblum, G. N. Calibration for the (computationallyidentifiable) masses. In International Conference on Machine Learning, 2018.

Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. science, 313 (5786):504–507, 2006.

Kamishima, T., Akaho, S., Asoh, H., and Sakuma, J. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 35–50. Springer, 2012.

Kearns, M., Neel, S., Roth, A., and Wu, Z. S. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, 2018.

Kingma, D. and Ba, J. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.

Kleinberg, J., Mullainathan, S., and Raghavan, M. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv: https://arxiv.org/abs/1609.05807, 2016.

Louizos, C., Swersky, K., Li, Y., Welling, M., and Zemel, R. The variational fair autoencoder. 2016.

Luc, P., Couprie, C., Chintala, S., and Verbeek, J. Semantic segmentation using adversarial networks. In NIPS Workshop on Adversarial Training, 2016.

Madras, D., Pitassi, T., and Zemel, R. Predict responsibly: Increasing fairness by learning to defer. arXiv preprint arXiv: https://arxiv.org/abs/1711.06664, 2017.

McNamara, D., Ong, C. S., and Williamson, R. C. Provably fair representations. arXiv preprint arXiv: https://arxiv.org/abs/1710.04394, 2017.

Odena, A. Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv: https://arxiv.org/abs/1606.01583, 2016.

Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., and Weinberger, K. Q. On fairness and calibration. In Advances in Neural Information Processing Systems, pp. 5684–5693, 2017.

Salimans, T., Goodfellow, I., Zaremba,W., Cheung, V., Radford, A., and Chen, X. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pp. 2234–2242, 2016.

Schmidhuber, J. Learning factorial codes by predictability minimization. Neural Computation, 4(6):863–879, 1992.

Zafar, M. B., Valera, I., Gomez Rodriguez, M., and Gummadi, K. P. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pp. 1171–1180. InternationalWorldWideWeb Conferences Steering Committee, 2017.

Zemel, R., Wu, Y., Swersky, K., Pitassi, T., and Dwork, C. Learning fair representations. In International Conference on Machine Learning, pp. 325–333, 2013.

Zhang, B. H., Lemoine, B., and Mitchell, M. Mitigating unwanted biases with adversarial learning. arXiv preprint arXiv: https://arxiv.org/abs/1801.07593, 2018.

28 Learning Controllable Fair Representations

Mutual Information-based Fair Representations (MIFR).

Bibliography Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. Deep variational information bottleneck. arXiv preprint arXiv: https://arxiv.org/abs/1612.00410, December 2016.

Alexander A Alemi, Ben Poole, Ian Fischer, Joshua V Dillon, Rif A Saurous, and Kevin Murphy. Fixing a broken ELBO. arXiv preprint arXiv: https://arxiv.org/abs/1711.00464, November 2017.

T Calders, F Kamiran, and M Pechenizkiy. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13–18, December 2009. doi: 10.1109/ICDMW.2009.83.

Flavio P Calmon, Dennis Wei, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized data Pre-Processing for discrimination prevention. arXiv preprint arXiv: https://arxiv.org/abs/1704.03354, April 2017.

Xi Chen, Diederik P Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. Variational lossy autoencoder. arXiv preprint arXiv: https://arxiv.org/abs/1611.02731, November 2016.

Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, June 2017. ISSN 2167-647X, 2167-6461. doi: 10.1089/big.2016. 0047.

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. arXiv preprint arXiv: https://arxiv.org/abs/1605.08803, May 2016.

Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint arXiv: https://arxiv.org/abs/1511.05897, November 2015.

Stephan Eissman, Daniel Levy, Rui Shu, Stefan Bartzsch, and Stefano Ermon. Bayesian optimization and attribute adjustment. In Proc. 34th Conference on Uncertainty in Artificial Intelligence, 2018.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.

Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J Smola. A kernel method for the two-sample-problem. In Advances in neural information processing systems, pages 513– 520, 2007.

Aditya Grover, , and Stefano Ermon. Uncertainty autoencoders: Learning compressed representations via variational information maximization. In AISTATS, 2019.

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5769–5779, 2017.

Moritz Hardt, Eric Price, Nati Srebro, and Others. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016.

Diederik P Kingma and Max Welling. Auto- Encoding variational bayes. arXiv preprint arXiv: https://arxiv.org/abs/1312.6114v10, December 2013.

Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and MaxWelling. Improving variational inference with inverse autoregressive flow. arXiv preprint arXiv: https://arxiv.org/abs/1606.04934, June 2016.

Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent Trade-Offs in the fair determination of risk scores. arXiv preprint arXiv: https://arxiv.org/abs/1609.05807, September 2016.

Junpei Komiyama, Akiko Takeda, Junya Honda, and Hajime Shimao. Nonconvex optimization for regression with fairness constraints. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2742–2751, Stockholmsmässan, Stockholm Sweden, 2018. PMLR.

Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. arXiv preprint arXiv: https://arxiv.org/abs/1511.00830, November 2015.

Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep Latent-Variable models. In I Guyon, U V Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, and R Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6446–6456. Curran Associates, Inc., 2017.

David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. arXiv preprint arXiv: https://arxiv.org/abs/1802.06309, February 2018.

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv: https://arxiv.org/abs/1505.05770, May 2015.

Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv: https://arxiv.org/abs/1601.06759, January 2016.

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Learning fair classifiers. arXiv preprint arXiv: https://arxiv.org/abs/1507. 05259, 2015.

Learning Controllable Fair Representations Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, February 2013.

Shengjia Zhao, Jiaming Song, and Stefano Ermon. InfoVAE: Information maximizing variational autoencoders. arXiv preprint arXiv: https://arxiv.org/abs/1706.02262, June 2017a.

Shengjia Zhao, Jiaming Song, and Stefano Ermon. Towards deeper understanding of variational autoencoding models. arXiv preprint arXiv: https://arxiv.org/abs/1702.08658, February 2017b.

Shengjia Zhao, Jiaming Song, and Stefano Ermon. A lagrangian perspective to latent variable generative models. Conference on Uncertainty in Artificial Intelligence, 2018.

Indre Zliobaite. On the relation between accuracy and fairness in binary classification. arXiv preprint arXiv: https://arxiv.org/abs/1505.05723, May 2015.

29 Learning Fair Representations

Fair Naive-Bayes(FNB) LFR (Learned Fair Representations) linear predictor (LR)

References Calders, T. and Verwer, S. Three naive bayes ap- proaches for discrimination-free classi cation. Data Mining and Knowledge Discovery, 21:277{292, 2010.

Dwork, C. and Mulligan, D. Privacy and classi cation concerns in online behavioral targeting: Mapping objections and sketching solutions. In Privacy law Scholars Conference, 2012.

Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data anal- ysis. In Theory of Cryptograph Conference (TCC), 2006.

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. Fairness through awareness. In Proceed- ings of Innovations of Theoretical Computer Sci- ence, 2011.

Frank, A. and Asuncion, A. UCI machine learning repository, 2010. URL http://archive.ics.uci.edu/ml.

Kamiran, F. and Calders, T. Classifying without dis- criminating. In 2nd International Conference on Computer, Control and Communication, pp. 1{6, 2009.

Kamishima, T., Akaho, S., and Sakuma, J. Fairness- aware learning through regularization approach. In IEEE 11th International Conference on Data Min- ing, pp. 643{650, 2011.

Kohavi, R. Scaling up the accuracy of naive-bayes classi ers: a decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996.

Luong, B., Ruggieri, S., and Turini, F. k-NN as an im- plementation of situation testing for discrimination discovery and prevention. In Proceedings of the 17th ACM KDD Conference, pp. 502{510, 2011.

Pedreschi, D., Ruggieri, S., and Turini, F. Discrimination-aware data mining. In Proceedings of the 14th ACM KDD Conference, pp. 560{568, 2008.

Tishby, N., Pereira, F.C., and Bialek, W. The In- formation Bottleneck method. In The 37th Annual Allerton Conference on Communication, Control, and Computing, 1999.

Zarsky, T. Automated prediction: Perception, law, and policy. CACM 15 (9), 2012.

30 Learning to Model the Tail

PCA SUN t-SNE National Science Foundation (NSF)

References [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.

[2] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.

[3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.

[4] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.

[5] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. IJCV, 115(3):211–252, 2015.

[6] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.

[7] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. TPAMI, 2017.

[8] X. Zhu, D. Anguelov, and D. Ramanan. Capturing long-tail distributions of object subcategories. In CVPR, 2014.

[9] X. Zhu, C. Vondrick, C. C. Fowlkes, and D. Ramanan. Do we need more training data? IJCV, 119(1):76–92, 2016.

[10] G. Van Horn and P. Perona. The devil is in the tails: Fine-grained classification in the wild. arXiv preprint arXiv: https://arxiv.org/abs/1709.01450, 2017.

[11] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The PASCAL visual object classes (VOC) challenge. IJCV, 88(2):303–338, 2010.

[12] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalanditis, L.-J. Li, D. A. Shamma, M. Bernstein, and L. Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 123(1):32–73, 2017.

[13] W. Ouyang, X. Wang, C. Zhang, and X. Yang. Factors in finetuning deep model for object detection with long-tail distribution. In CVPR, 2016.

[14] J. Xiao, K. A. Ehinger, J. Hays, A. Torralba, and A. Oliva. SUN database: Exploring a large collection of scene categories. IJCV, 119(1):3–22, 2016.

[15] S. Bengio. Sharing representations for long tail computer vision problems. In ICMI, 2015.

[16] L. Shen, Z. Lin, and Q. Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In ECCV, 2016.

[17] Q. Zhong, C. Li, Y. Zhang, H. Sun, S. Yang, D. Xie, and S. Pu. Towards good practices for recognition & detection. In CVPR workshops, 2016.

[18] S. J. Pan and Q. Yang. A survey on transfer learning. TKDE, 22(10):1345–1359, 2010.

[19] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In NIPS, 2014.

[20] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.

[21] Y.-X.Wang and M. Hebert. Learning to learn: Model regression networks for easy small sample learning. In ECCV, 2016.

[22] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas. Learning to learn by gradient descent by gradient descent. In NIPS, 2016.

[23] Y.-X. Wang and M. Hebert. Learning from small sample sets by combining unsupervised meta-training with CNNs. In NIPS, 2016.

[24] K. Li and J. Malik. Learning to optimize. In ICLR, 2017.

[25] S. Ravi and H. Larochelle. Optimization as a model for few-shot learning. In ICLR, 2017.

[26] A. Sinha, M. Sarkar, A. Mukherjee, and B. Krishnamurthy. Introspection: Accelerating neural network training by learning weight evolution. In ICLR, 2017.

[27] H. He and E. A. Garcia. Learning from imbalanced data. TKDE, 21(9):1263–1284, 2009.

[28] C. Huang, Y. Li, C. C. Loy, and X. Tang. Learning deep representation for imbalanced classification. In CVPR, 2016.

[29] S. Thrun and L. Pratt. Learning to learn. Springer Science & Business Media, 2012.

[30] J. Schmidhuber, J. Zhao, and M. Wiering. Shifting inductive bias with success-story algorithm, adaptive levin search, and incremental self-improvement. Machine Learning, 28(1):105–130, 1997.

[31] R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.

[32] J. Schmidhuber. Evolutionary principles in self-referential learning. On learning how to learn: The meta-meta-... hook.) Diploma thesis, Institut f. Informatik, Tech. Univ. Munich, 1987.

[33] J. Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139, 1992.

[34] J. Schmidhuber. A neural network that embeds its own meta-levels. In IEEE International Conference on Neural Networks, 1993.

[35] L. Bertinetto, J. F. Henriques, J. Valmadre, P. Torr, and A. Vedaldi. Learning feed-forward one-shot learners. In NIPS, 2016.

[36] D. Ha, A. Dai, and Q. V. Le. Hypernetworks. In ICLR, 2017.

[37] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.

[38] S.-A. Rebuffi, H. Bilen, and A. Vedaldi. Learning multiple visual domains with residual adapters. In NIPS, 2017.

[39] T. Munkhdalai and H. Yu. Meta networks. In ICML, 2017.

[40] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In NIPS, 2013.

[41] J. Ba, K. Swersky, S. Fidler, and R. Salakhutdinov. Predicting deep zero-shot convolutional neural networks using textual descriptions. In ICCV, 2015.

[42] H. Noh, P. H. Seo, and B. Han. Image question answering using convolutional neural network with dynamic parameter prediction. In CVPR, 2016.

[43] L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. TPAMI, 28(4):594– 611, 2006.

[44] Y.-X. Wang and M. Hebert. Model recommendation: Generating object detectors from few samples. In CVPR, 2015.

[45] G. Koch, R. Zemel, and R. Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML Workshops, 2015.

[46] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.

[47] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. One-shot learning with memory-augmented neural networks. In ICML, 2016.

[48] Y.-X. Wang and M. Hebert. Learning by transferring from unsupervised universal sources. In AAAI, 2016.

[49] Z. Li and D. Hoiem. Learning without forgetting. In ECCV, 2016.

[50] B. Hariharan and R. Girshick. Low-shot visual recognition by shrinking and hallucinating features. In ICCV, 2017.

[51] J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Säckinger, and R. Shah. Signature verification using a "siamese" time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence, 7(4):669–688, 1993.

[52] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra. Matching networks for one shot learning. In NIPS, 2016.

[53] J. Snell, K. Swersky, and R. S. Zemel. Prototypical networks for few-shot learning. In NIPS, 2017.

[54] Y. Fu, T. Xiang, Y.-G. Jiang, X. Xue, L. Sigal, and S. Gong. Recent advances in zero-shot recognition: Toward data-efficient understanding of visual content. IEEE Signal Processing Magazine, 35(1):112–125, 2018.

[55] D. George, W. Lehrach, K. Kansky, M. Lázaro-Gredilla, C. Laan, B. Marthi, X. Lou, Z. Meng, Y. Liu, H. Wang, A. Lavin, and D. S. Phoenix. A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science, 2017.

[56] E. Triantafillou, R. Zemel, and R. Urtasun. Few-shot learning through an information retrieval lens. In NIPS, 2017.

[57] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In ECCV, 2016.

[58] C. Sun, A. Shrivastava, S. Singh, and A. Gupta. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV, 2017.

[59] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM MM, 2014.

[60] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, 2014.

[61] P. Agrawal, R. Girshick, and J. Malik. Analyzing the performance of multilayer neural networks for object recognition. In ECCV, 2014.

[62] M. Huh, P. Agrawal, and A. A. Efros. What makes ImageNet good for transfer learning? In NIPS workshops, 2016.

[63] Y.-X. Wang, D. Ramanan, and M. Hebert. Growing a brain: Fine-tuning by increasing model capacity. In CVPR, 2017.

[64] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. JMLR, 9(Nov):2579–2605, 2008.

31 Limitations of the Lipschits constant as a defense against adversarial examples

CoRR CRA

References

  1. Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. CoRR abs/1802.00420 (2018)

  2. Carlini, N.,Wagner, D.A.: Adversarial examples are not easily detected: Bypassing ten detection methods. In: AISec@CCS (2017)

  3. Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. 2017 IEEE Symposium on Security and Privacy (SP) pp. 39–57 (2017)

  4. Ciss´e, M., Bojanowski, P., Grave, E., Dauphin, Y., Usunier, N.: Parseval networks: Improving robustness to adversarial examples. In: ICML (2017)

  5. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493–2537 (2011)

Limitations of the Lipschitz constant 11 6. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. CoRR abs/1412.6572 (2014)

  1. Hinton, G.E., Deng, L., Yu, D., Dahl, G.E., rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 82–97 (2012)

  2. Kolter, J.Z., Wong, E.: Provable defenses against adversarial examples via the convex outer adversarial polytope. CoRR abs/1711.00851 (2017)

  3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con- volutional neural networks. In: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Wein- berger (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–

  4. Curran Associates, Inc. (2012)

  5. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. CoRR abs/1706.06083 (2017)

  6. Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: AsiaCCS (2017)

  7. Qian, H., Wegman, M.N.: L2-nonexpansive neural networks. CoRR abs/1802.07896 (2018)

  8. Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. CoRR abs/1801.09344 (2018)

  9. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. CoRR abs/1312.6199 (2013)

  10. Tramer, F., Papernot, N., Goodfellow, I.J., Boneh, D., McDaniel, P.D.: The space of transferable adversarial examples. CoRR abs/1704.03453 (2017)

  11. Tsuzuku, Y., Sato, I., Sugiyama, M.: Lipschitz-margin training: Scalable certifica- tion of perturbation invariance for deep neural networks. CoRR abs/1802.04034 (2018)

32 Max-margin Class Imbalanced Learning with Gaussian Affinity

Synthetic Minority Oversampling TEchnique (SMOTE) Labelled Faces in the Wild (LFW) YouTube Faces (YTF) Celebrities in Frontal Profile (CFP) frontal-frontal (FF) frontal-profile (FP) Squeeze and Excitation (SE)

References [1] L. Ballerini, R. B. Fisher, B. Aldridge, and J. Rees. Nonmelanoma skin lesion classification using colour image data in a hierarchical k-nn classifier. In Biomedical Imaging (ISBI), 2012 9th IEEE International Symposium on, pages 358–361. IEEE, 2012.

[2] L. Ballerini, R. B. Fisher, B. Aldridge, and J. Rees. A color and texture based hierarchical k-nn approach to the classification of non-melanoma skin lesions. In Color Medical Image Analysis, pages 63–86. Springer, 2013.

[3] R. Barandela, E. Rangel, J. S. S´anchez, and F. J. Ferri. Restricted decontamination for the imbalanced training sample problem. In Iberoamerican Congress on Pattern Recognition, pages 424–431. Springer, 2003.

[4] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In International Conference on Automatic Face and Gesture Recognition, 2018.

[5] C. L. Castro and A. P. Braga. Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE transactions on neural networks and learning systems, 24(6):888–899, 2013.

[6] J.-R. Chang and Y.-S. Chen. Batch-normalized maxout network in network. arXiv preprint arXiv: https://arxiv.org/abs/1709.014501511.02583, 2015.

[7] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321– 357, 2002.

[8] B. Chen, W. Deng, and J. Du. Noisy softmax: Improving the generalization ability of dcnn via postponing the early softmax saturation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5372– 5381, 2017.

[9] C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995.

[10] J. Deng, J. Guo, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv: https://arxiv.org/abs/1709.014501801.07698, 2018.

[11] J. Deng, Y. Zhou, and S. Zafeiriou. Marginal loss for deep face recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2006–2014. IEEE, 2017.

[12] G. F. Elsayed, D. Krishnan, H. Mobahi, K. Regan, and S. Bengio. Large margin deep networks for classification. arXiv preprint arXiv: https://arxiv.org/abs/1709.014501803.05598, 2018.

[13] R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality reduction by learning an invariant mapping. In null, pages 1735– 1742. IEEE, 2006.

[14] H. Han, W.-Y. Wang, and B.-H. Mao. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing, pages 878–887. Springer, 2005.

[15] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009.

[16] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf. Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18–28, 1998.

[17] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv preprint arXiv: https://arxiv.org/abs/1709.014501709.01507, 7, 2017.

[18] C. Huang, Y. Li, C. Change Loy, and X. Tang. Learning deep representation for imbalanced classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5375–5384, 2016.

[19] C. Huang, C. C. Loy, and X. Tang. Discriminative sparse neighbor approximation for imbalanced learning. IEEE transactions on neural networks and learning systems, 29(5):1503–1513, 2018.

[20] P. Jeatrakul, K. W. Wong, and C. C. Fung. Classification of imbalanced data by combining the complementary neural network and smote algorithm. In International Conference on Neural Information Processing, pages 152–159. Springer, 2010.

[21] T. Jo and N. Japkowicz. Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter, 6(1):40–49, 2004.

[22] S. Khan, H. Rahmani, S. A. A. Shah, and M. Bennamoun. A guide to convolutional neural networks for computer vision. Synthesis Lectures on Computer Vision, 8(1):1–207, 2018.

[23] S. H. Khan, M. Hayat, M. Bennamoun, F. Sohel, and R. Togneri. Cost sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 2017.

[24] B. Krawczyk, M. Woniak, and G. Schaefer. Cost-sensitive decision tree ensembles for effective imbalanced classification. Applied Soft Computing, 14:554 – 562, 2014.

[25] M. KUBAT. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the 14th International Conference on Machine Learning, pages 179–186. Morgan Kaufmann, 1997.

[26] S. Lawrence, I. Burns, A. Back, A. C. Tsoi, and C. L. Giles. Neural network classification and prior class probabilities. In Neural networks: Tricks of the trade, pages 295–309. Springer, 2012.

[27] G. B. H. E. Learned-Miller. Labeled faces in the wild: Updates and new reporting procedures. Technical Report UMCS- 2014-003, University of Massachusetts, Amherst, May 2014.

[28] C.-Y. Lee, P. W. Gallagher, and Z. Tu. Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 464–472, 2016.

[29] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeplysupervised nets. 2015.

[30] M. Li, X. Chen, X. Li, B. Ma, and P. M. Vit´anyi. The similarity metric. IEEE transactions on Information Theory, 50(12):3250–3264, 2004.

[31] J. Liu, Y. Deng, T. Bai, Z.Wei, and C. Huang. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv preprint arXiv: https://arxiv.org/abs/1709.014501506.07310, 2015.

[32] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song. Sphereface: Deep hypersphere embedding for face recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6738–6746. IEEE, 2017.

[33] W. Liu, Y. Wen, Z. Yu, and M. Yang. Large-margin softmax loss for convolutional neural networks. In International Conference on Machine Learning, pages 507–516, 2016.

[34] I. Masi, A. T. Trn, T. Hassner, J. T. Leksut, and G. Medioni. Do we really need to collect millions of faces for effective face recognition? In European Conference on Computer Vision, pages 579–596. Springer, 2016.

[35] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou. Agedb: the first manually collected, inthe- wild age database. In Proceedings of IEEE Intl Conf. on Computer Vision and Pattern Recognition (CVPR-W 2017), Honolulu, Hawaii, June 2017.

[36] W. W. Ng, G. Zeng, J. Zhang, D. S. Yeung, and W. Pedrycz. Dual autoencoders features for imbalance classification problem. Pattern Recognition, 60:875–889, 2016.

[37] O. M. Parkhi, A. Vedaldi, A. Zisserman, et al. Deep face recognition. In Proceedings of the British Machine Vision Conference, pages 6–14, 2015.

[38] M. D. Richard and R. P. Lippmann. Neural network classifiers estimate bayesian a posteriori probabilities. Neural computation, 3(4):461–483, 1991.

[39] C. C. V. P. R. C. D. J. S. Sengupta, J.C. Cheng. Frontal to profile face verification in the wild. In IEEE Conference on Applications of Computer Vision, February 2016.

[40] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015.

[41] L. Shen, Z. Lin, and Q. Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In European conference on computer vision, pages 467–482. Springer, 2016.

[42] V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 614–622. ACM, 2008. [43] J. Snell, K. Swersky, and R. Zemel. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, pages 4077–4087, 2017.

[44] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Advances in neural information processing systems, pages 1988–1996, 2014.

[45] Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2892–2900, 2015.

[46] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1701–1708, 2014.

[47] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Webscale training for face identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2746–2754, 2015.

[48] Y. Tang, Y.-Q. Zhang, N. V. Chawla, and S. Krasser. Svms modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(1):281–288, 2009.

[49] K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In In Proceedings of the 17th International Conference on Machine Learning. Citeseer, 2000.

[50] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pages 3630–3638, 2016.

[51] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, andW. Liu. Cosface: Large margin cosine loss for deep face recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[52] S.Wang,W. Liu, J.Wu, L. Cao, Q. Meng, and P. J. Kennedy. Training deep neural networks on imbalanced data sets. In Neural Networks (IJCNN), 2016 International Joint Conference on, pages 4368–4374. IEEE, 2016.

[53] Y.-X.Wang, D. Ramanan, and M. Hebert. Learning to model the tail. In Advances in Neural Information Processing Systems, pages 7029–7039, 2017.

[54] Y. Wen, K. Zhang, Z. Li, and Y. Qiao. A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision, pages 499–515. Springer, 2016.

[55] L.Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 529–534. IEEE, 2011.

[56] Y. Wu, H. Liu, J. Li, and Y. Fu. Deep face recognition with center invariant loss. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pages 408–414. ACM, 2017.

[57] Y. Yao, L. Rosasco, and A. Caponnetto. On early stopping in gradient descent learning. Constructive Approximation, 26(2):289–315, 2007.

[58] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker. Feature transfer learning for deep face recognition with long-tail data. arXiv preprint arXiv: https://arxiv.org/abs/1709.014501803.09014, 2018.

[59] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503, Oct 2016.

[60] X. Zhang, Z. Fang, Y. Wen, Z. Li, and Y. Qiao. Range loss for deep face recognition with long-tailed training data. In Proceedings of the IEEE International Conference on Computer Vision, pages 5409–5418, 2017.

[61] Z.-H. Zhou and X.-Y. Liu. On multi-class cost-sensitive learning. Computational Intelligence, 26(3):232–257, 2010.

33 mixup: Beyond Empirical Risk Minimization

LeNet ? VGG-11 Fast Gradient Sign Method (FGSM) Iterative FGSM (I-FGSM)

REFERENCES D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, et al. Deep speech 2: End-to-end speech recognition in English and Mandarin. In ICML, 2016. D. Arpit, S. Jastrzebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio, et al. A closer look at memorization in deep networks. ICML, 2017.

P. Bartlett, D. J. Foster, and M. Telgarsky. Spectrally-normalized margin bounds for neural networks. NIPS, 2017.

O. Chapelle, J. Weston, L. Bottou, and V. Vapnik. Vicinal risk minimization. NIPS, 2000.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.

C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, and T. Robinson. One billion word benchmark for measuring progress in statistical language modeling. arXiv, 2013. https://arxiv.org/abs/1312.3005

M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: Improving robustness to adversarial examples. ICML, 2017.

W. M. Czarnecki, S. Osindero, M. Jaderberg, G. ´ Swirszcz, and R. Pascanu. Sobolev training for neural networks. NIPS, 2017.

T. DeVries and G. W. Taylor. Dataset augmentation in feature space. ICLR Workshops, 2017.

H. Drucker and Y. Le Cun. Improving generalization performance using double backpropagation. IEEE Transactions on Neural Networks, 3(6):991–997, 1992.

I. Goodfellow. Tutorial: Generative adversarial networks. NIPS, 2016.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. NIPS, 2014. Published as a conference paper at ICLR 2018

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. ICLR, 2015. P. Goyal, P. Doll´ar, R. Girshick, P. Noordhuis, L.Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv, 2017.

A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In ICASSP. IEEE, 2013.

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of Wasserstein GANs. NIPS, 2017. N. Harvey, C. Liaw, and A. Mehrabian. Nearly-tight VC-dimension bounds for piecewise linear neural networks. JMLR, 2017. K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. ECCV, 2016. M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. NIPS, 2017. G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012. G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. CVPR, 2017. D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015. A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. NIPS, 2012. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of IEEE, 2001. http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf

M. Lichman. UCI machine learning repository, 2013. K. Liu, 2017. URL https://github.com/kuangliu/pytorch-cifar. G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hinton. Regularizing neural networks by penalizing confident output distributions. ICLR Workshops, 2017. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. IJCV, 2015. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016. P. Simard, Y. LeCun, J. Denker, and B. Victorri. Transformation invariance in pattern recognitiontangent distance and tangent propagation. Neural networks: tricks of the trade, 1998. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. ICLR, 2015. J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. ICLR Workshops, 2015. N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus. Intriguing properties of neural networks. ICLR, 2014. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. V. N. Vapnik. Statistical learning theory. J. Wiley, 1998. V. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 1971.

Published as a conference paper at ICLR 2018 A. Veit, 2017. URL https://github.com/andreasveit. P. Warden, 2017. URL https://research.googleblog.com/2017/08/ launching-speech-commands-dataset.html. S. Xie, R. Girshick, P. Doll´ar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. CVPR, 2016. S. Zagoruyko and N. Komodakis. Wide residual networks. BMVC, 2016a. S. Zagoruyko and N. Komodakis, 2016b. URL https://github.com/szagoruyko/ wide-residual-networks. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generalization. ICLR, 2017. C. Zhang, 2017. URL https://github.com/pluskid/fitting-random-labels. Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang. Random erasing data augmentation. arXiv, 2017.

#34 On Learning Density Aware Embeddings

3D, three dimensions

Density Aware Quadruplet Loss (DAQL) Density Aware Triplet Loss (DATL) Kernel Density Estimate (KDE)

References [1] Himanshu S Bhatt, Richa Singh, Mayank Vatsa, and Nalini K Ratha. Improving cross-resolution face matching using ensemble-based co-transfer learning. IEEE TIP, 23(12):5654–5669, 2014. 6 [2] Soma Biswas, Gaurav Aggarwal, Patrick J Flynn, and Kevin W Bowyer. Pose-robust recognition of low-resolution face images. IEEE TPAMI, 35(12):3037–3049, 2013. 6 [3] Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. Beyond triplet loss: a deep quadruplet network for person re-identification. In IEEE CVPR, 2017. 1, 2, 5, 6, 7 [4] De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In IEEE CVPR, pages 1335–1344, 2016. 1 [5] Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, pages 215–223, 2011. 5, 6 [6] Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE TPAMI, 24(5):603–619, 2002. 2, 3 [7] Rishabh Garg, Yashasvi Baweja, Richa Singh, Mayank Vatsa, and Nalini Ratha. Heterogeneity aware deep embedding for mobile periocular recognition. In IEEE BTAS, 2018. 1 [8] Mislav Grgic, Kresimir Delac, and Sonja Grgic. Scface– surveillance cameras face database. Springer MTA, 51(3):863–879, 2011. 5, 6 [9] Sanchit Gupta, Nikita Gupta, Soumyadeep Ghosh, Maneet Singh, Shruti Nagpal, Richa Singh, and Mayank Vatsa. FaceSurv: A benchmark video dataset for face detection and recognition across spectra and resolutions. In IEEE FG, 2019. 5, 6 [10] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In IEEE CVPR, pages 1735–1742, 2006. 2 [11] Xufeng Han, Thomas Leung, Yangqing Jia, Rahul Sukthankar, and Alexander C Berg. Matchnet: Unifying feature and metric learning for patch-based matching. In IEEE CVPR, pages 3279–3286, 2015. 1 [12] Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. Triplet-center loss for multi-view 3D object retrieval. In IEEE CVPR, 2018. 1, 2, 6, 7, 8 [13] Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017. 1, 2, 5, 6, 7 [14] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 5, 6 [15] BG Kumar, Gustavo Carneiro, Ian Reid, et al. Learning local image descriptors with deep siamese and triplet convolutional networks by minimising global loss functions. In IEEE CVPR, pages 5385–5394, 2016. 1, 6 [16] Xiaoxiang Liu, Lingxiao Song, Xiang Wu, and Tieniu Tan. Transferring deep representation for NIR-VIS heterogeneous face recognition. In IAPR ICB, pages 1–8, 2016. 6 [17] Ze Lu, Xudong Jiang, and Alex ChiChung Kot. Deep coupled ResNet for low-resolution face recognition. IEEE SPL, pages 2030–2035, 2018. 6 [18] Jonathan Masci, Davide Migliore, Michael M Bronstein, and J¨urgen Schmidhuber. Descriptor learning for omnidirectional image matching. In RRIV, pages 49–62. Springer, 2014. 1, 2 [19] Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain. IEEE TIP, 21(12):4695–4708, 2012. 7 [20] Florian Schroff, Dmitry Kalenichenko, and James Philbin. FaceNet: A unified embedding for face recognition and clustering. In IEEE CVPR, pages 815–823, 2015. 1, 2, 4, 7 [21] Hailin Shi, Yang Yang, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Weishi Zheng, and Stan Z Li. Embedding deep metric for person re-identification: A study against large variations. In ECCV, pages 732–748, 2016. 1 [22] Hyun Oh Song, Stefanie Jegelka, Vivek Rathod, and Kevin Murphy. Deep metric learning via facility location. In IEEE CVPR, pages 1014–1023, 2017. 5 [23] Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. Deep learning face representation by joint identificationverification. In NIPS, pages 1988–1996, 2014. 1, 2 [24] Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep learning face representation from predicting 10,000 classes. In IEEE CVPR, pages 1891–1898, 2014. 1, 2 [25] Kilian Q Weinberger and Lawrence K Saul. Distance metric learning for large margin nearest neighbor classification. JMLR, 10:207–244, 2009. 2 [26] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In ECCV, pages 499–515, 2016. 2, 6 [27] Paul Wohlhart and Vincent Lepetit. Learning descriptors for object recognition and 3D pose estimation. In IEEE CVPR, pages 3109–3118, 2015. 1 [28] XiangWu, Ran He, Zhenan Sun, and Tieniu Tan. A light cnn for deep face representation with noisy labels. IEEE TPAMI, 13(11):2884–2896, 2018. 5 [29] Fuwei Yang, Wenming Yang, Riqiang Gao, and Qingmin Liao. Discriminative multidimensional scaling for lowresolution face recognition. IEEE SPL, 25(3):388–392, 2018. 6 [30] Yuhui Yuan, Kuiyuan Yang, and Chao Zhang. Hard-aware deeply cascaded embedding. In IEEE ICCV, pages 1539– 1547, 2017. 1, 2 [31] Sergey Zagoruyko and Nikos Komodakis. Learning to compare image patches via convolutional neural networks. In IEEE CVPR, pages 4353–4361, 2015. 1

#35 On the Legal Comparability of Fairness Definitions

Housing and Urban Development (HUD).

References [1] Fisher v. Univ. of TX at Austin, 570 U.S. __ (2013). [2] Griggs v. Duke Power Co., 401 U.S. 424 (1971). [3] Grutter v. Bollinger, 539 U.S. 306 (2003). [4] Regents of Univ. of California v. Bakke, 438 U.S. 265 (1978). [5] Ricci v. DeStefano, 557 U.S. 557 (2009). [6] Ricci v. DeStefano, 557 U.S. 557 (2009) (Scalia, J., concurring). [7] Texas Dept. of Housing and Community Affairs v. Inclusive Communities Project, Inc., 576 U.S. __ (2015). [8] Hud’s implementation of the fair housing act’s disparate impact standard, 2019. [9] Julia Angwin and Terry Parris Jr. Facebook lets advertisers exclude users by race, 2016. [10] Bradley A Areheart. The symmetry principle. BCL Rev., 58:1085, 2017. [11] Jack M Balkin and Reva B Siegel. The american civil rights tradition: Anticlassification or antisubordination. Issues in Legal Scholarship, 2(1), 2003. [12] Solon Barocas and Andrew D Selbst. Big data’s disparate impact. Calif. L. Rev., 104:671, 2016. [13] Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77–91, 2018. [14] Danielle Keats Citron and Frank Pasquale. The scored society: Due process for automated predictions. Wash. L. Rev., 89:1, 2014. [15] Sam Corbett-Davies and Sharad Goel. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023, 2018. [16] Kimberle Crenshaw. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. u. Chi. Legal f., page 139, 1989. [17] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012. [18] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259– 268. ACM, 2015. [19] Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. On the (im) possibility of fairness. arXiv preprint arXiv:1609.07236, 2016. 5 [20] Jack Gillum and Ariana Tobin. Facebook won’t let employers, landlords or lenders discriminate in ads anymore, 2019. [21] Nina Grgi´c-Hlaˇca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [22] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016. [23] Anna Lauren Hoffmann. Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse. Information, Communication & Society, 22(7):900–915, 2019. [24] Ben Hutchinson, KJ Pittl, andMargaretMitchell. Interpreting social respect: A normative lens for ml models. arXiv preprint arXiv:1908.07336, 2019. [25] Robert C Jacobs and Ruth S Sparrow. Fair housing act guidance memorandum of understanding of the treasury, hud, and justice departments. J. Affordable Hous. & Cmty. Dev. L., 10:16, 2000. [26] Matthew Jagielski, Michael Kearns, Jieming Mao, Alina Oprea, Aaron Roth, Saeed Sharifi- Malvajerdi, and Jonathan Ullman. Differentially private fair learning. arXiv preprint arXiv:1812.02696, 2018. [27] Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, and Zhiwei Steven Wu. Eliciting and enforcing subjective individual fairness. arXiv preprint arXiv:1905.10660, 2019. [28] Sampath Kannan, Aaron Roth, and Juba Ziani. Downstream effects of affirmative action. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 240–248. ACM, 2019. [29] Pauline T Kim. Data-driven discrimination at work. Wm. & Mary L. Rev., 58:857, 2016. [30] Joshua A Kroll, Solon Barocas, EdwardW Felten, Joel R Reidenberg, David G Robinson, and Harlan Yu. Accountable algorithms. U. Pa. L. Rev., 165:633, 2016. [31] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066–4076, 2017. [32] Zachary Lipton, Julian McAuley, and Alexandra Chouldechova. Does mitigating ml’s impact disparity require treatment disparity? In Advances in Neural Information Processing Systems, pages 8125–8135, 2018. [33] Richard Primus. Of visible race-consciousness and institutional role: Equal protection and disparate impact after ricci and inclusive communities. In Title VII of the Civil Rights Act After 50 Years: Proceedings of the New York University 67th Annual Conference on Labor (LexisNexis Publishing 2015 Forthcoming), 2015. [34] George Rutherglen. Disparate impact under title vii: an objective theory of discrimination. Va. L. Rev., 73:1297, 1987. [35] Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 59–68. ACM, 2019.

#36 One-network Adversarial Fairness

Fair adversarial discriminative (FAD) minibatch diversity(MD) maximum mean discrepancy (MMD)

References Agarwal, A.; Beygelzimer, A.; Dudik, M.; Langford, J.; and Wallach, H. 2018. A reductions approach to fair classification. ICML. Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; and Marchand, M. 2014. Domain adversarial neural networks. arXiv preprint arXiv:1412.4446. Arjovsky, M.; Chintala, S.; and Bottou, L. 2017. Wasserstein generative adversarial networks. ICML. Barocas, S., and Selbst, A. 2016. Big Data’s Disparate Impact. California Law Review. Bartlett, P., and Mendelson, S. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. JMLR. Bechavod, Y., and Ligett, K. 2017. Penalizing unfairness in binary classification. arXiv preprint arXiv:1707.00044. Ben-David, S.; Blitzer, J.; Crammer, K.; and Pereira, F. 2007. Analysis of representations for domain adaptation. NIPS 21:137–144. Ben-David, S.; Blitzer, S.; Crammer, K.; Kulesza, A.; Pereira, F.; and Vaughan, J. 2010. A theory of learning from different domains. Machine learning 79(2):151–175. Beutel, A.; Chen, J.; Zhao, Z.; and Chi, E. 2017. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075. Brennan, T.; Dieterich,W.; and Ehret, B. 2009. Evaluating the predictive validity of the COMPAS risk and needs assessment system. Criminal Justice and Behavior 36:21–40. Celis, E.; Huang, L.; Keswani, V.; and Vishnoi, N. 2018. Classification with fairness constraints: A meta-algorithm with provable guarantees. arXiv preprint arXiv:1806.06055. Chouldechova, A. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 2. Devroye, L.; Gyorfi, L.; and Lugosi, G. 1996. A probabilistic theory of pattern recognition. Springer. Dheeru, D., and Taniskidou, E. K. 2017. UCI ML Repository. Edwards, H., and Storkey, A. 2016. Censoring representations with an adversary. ICLR. Feldman, M.; Friedler, S.; Moeller, J.; Scheidegger, C.; and Venkatasubramanian, S. 2015. Certifying and removing disparate impact. KDD. Fish, B.; Kun, J.; and Lelkes, A. 2016. A confidence-based approach for balancing fairness and accuracy. SDM. Friedler, S.; Scheidegger, C.; Venkatasubramanian, S.; Choudhary, S.; Hamilton, E.; and Roth, D. 2018. A comparative study of fairness-enhancing interventions in machine learning. arXiv preprint arXiv:1802.04422. Ganin, Y., and Lempitsky, V. 2015. Unsupervised domain adaptation by backpropagation. ICML 32. Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; and Lempitsky, V. 2016. Domainadversarial training of neural networks. JMLR 17(59):1–35. Goel, S.; Rao, J.; and Shroff, R. 2015. Precinct or prejudice? Understanding racial disparities in New York City’s stop-and-frisk policy. Annals of Applied Statistics. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde- Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. NIPS 2672–2680. Goodfellow, I. 2016. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160. Gretton, A.; Borgwardt, K.; Rasch, M.; Scholkopf, B.; and Smola, A. 2006. A kernel method for the two-sample-problem. NIPS. Gretton, A.; Borgwardt, K.; Rasch, M.; Scholkopf, B.; and Smola, A. 2012. A kernel two-sample test. JMLR 13. Grgic-Hlaca, N.; Redmiles, E.; Gummadi, K.; and Weller, A. 2018a. Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction. WWW. Grgic-Hlaca, N.; Zafar, M.; Gummadi, K.; and Weller, A. 2018b. Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning. AAAI. Hajian, S.; Domingo-Ferrer, J.; Monreale, A.; Pedreschi, D.; and Giannotti, F. 2015. Discrimination and privacy-aware patterns. Data Mining and Knowledge Discovery 1733–1782. Hardt, M.; Price, E.; and Srebro, N. 2016. Equality of opportunity in supervised learning. NIPS. Kamishima, T.; Akaho, S.; Asoh, H.; and Sakuma, J. 2012. Fairness-aware classifier with prejudice remover regularizer. Kearns, M.; Neel, S.; Roth, A.; and Wu, Z. 2018. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. ICML. Khandani, A.; Kim, A.; and Lo, A. 2010. Consumer credit-risk models via machine-learning algorithms. JBF 34:2767–2787. Kifer, D.; Ben-David, S.; and Gehrke, J. 2004. Detecting change in data streams. VLDB 180–191. Kim, T.; Cha, M.; Kim, H.; Lee, J.; and Kim, J. 2017. Learning to discover cross-domain relations with generative adversarial networks. ICML. Kingma, D., and Ba, J. 2015. Adam: A Method for Stochastic Optimization. ICLR. Koltchinskii, V., and Panchenko, D. 2000. Rademacher processes and bounding the risk of function learning. HDP. Komiyama, J.; Takeda, A.; Honda, J.; and Shimao, H. 2018. Nonconvex optimization for regression with fairness constraints. ICML. Kusner, M.; Loftus, J.; Russell, C.; and Silva, R. 2017. Counterfactual fairness. NIPS. Larson, J.; Mattu, S.; Kirchner, L.; and Angwin, J. 2016. https://github.com/propublica/compas-analysis. Lloyd, J., and Ghahramani, Z. 2015. Statistical model criticism using kernel two sample tests. NIPS. Louizos, C.; Swerky, K.; Li, Y.; Welling, M.; and Zemel, R. 2016. The variational fair autoencoder. ICLR. Louppe, G.; Kagan, M.; and Cranmer, K. 2017. Learning to pivot with adversarial networks. NIPS. Madras, D.; Creager, E.; Pitassi, T.; and Zemel, R. 2018. Learning adversarially fair and transferable representations. ICML. Mansour, Y.; Mohri, M.; and Rostamizadeh, A. 2009. Domain adaptation: Learning bounds and algorithms. COLT. Metz, L.; Poole, B.; Pfau, D.; and Sohl-Dickstein, J. 2017. Unrolled generative adversarial networks. ICLR. Mohri, M., and Afshin, R. 2008a. Rademacher complexity bounds for non-IID processes. NIPS. Narasimhan, H. 2018. Learning with complex loss functions and constraints. AISTATS 1646–1654. Primus, R. 2010. The future of disparate impact. Mich. Law Rev. Rosca, M.; Lakshminarayanan, B.; Warde-Farley, D.; and Mohamed, S. 2017. Variational approaches for auto-encoding generative adversarial networks. arXiv preprint arXiv:1706.04987. Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; and Radford, A. 2016. Improved techniques for training GANs. NIPS. Scholkopf, B.; Williamson, R.; Smola, A.; Shawe-Taylor, J.; and Platt, J. 2000. Support vector method for novelty detection. NIPS. Shalev-Shwartz, S., and Ben-David, S. 2014. Understanding machine learning: From theory to algorithms. Cambridge Univ. Press. Srivastava, A.; Valkov, L.; Russell, C.; Gutmann, M.; and Sutton, C. 2017. VEEGAN: Reducing mode collapse in GANs using implicit variational learning. NIPS. Wadsworth, C.; Vera, F.; and Piech, C. 2018. Achieving fairness through adversarial learning: an application to recidivism prediction. FAT/ML Workshop. Xu, D.; Yuan, S.; Zhang, L.; and Wu, X. 2018. FairGAN: Fairness-aware generative adversarial networks. arXiv preprint arXiv:1805.11202. Zafar, M.; Valera, I.; Rodriguez, M.; Gummadi, K.; and Weller, A. 2017a. From parity to preference-based notions of fairness in classification. NIPS. Zafar, M.; Valera, I.; Rodriguez, M.; and Gummadi, K. 2017b. Fairness beyond disparate treatment and disparate impact: Learning classification without disparate mistreatment. WWW. Zafar, M.; Valera, I.; Rodriguez, M.; and Gummadi, K. 2017c. Fairness constraints: Mechanisms for fair classification. AISTATS. Zemel, R.; Wu, Y.; Swersky, K.; Pitassi, T.; and Dwork, C. 2013. Learning fair representations. ICML 325–333. Zhang, B.; Lemoine, B.; and Mitchell, M. 2018. Mitigating unwanted biases with adversarial learning. arXiv:1801.07593.

#37 Online Learning with an Unknown Fairness Metric

xReferences Yasin Abbasi-Yadkori, D´avid P´al, and Csaba Szepesv´ari. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain., pages 2312–2320, 2011. URL http://papers.nips.cc/paper/4417-improved-algorithms-for-linear-stochastic-bandits. Richard Berk, Hoda Heidari, Shahin Jabbari,Michael Kearns, and Aaron Roth. Fairness in criminal justice risk assessments: the state of the art. arXiv preprint arXiv:1703.09207, 2017. Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. arXiv preprint arXiv:1703.00056, 2017. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012. Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. On the (im) possibility of fairness. arXiv preprint arXiv:1609.07236, 2016. Sara Hajian and Josep Domingo-Ferrer. A methodology for direct and indirect discrimination prevention in data mining. IEEE transactions on knowledge and data engineering, 25(7):1445– 1459, 2013. Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 2016. Ursula H´ebert-Johnson, Michael P Kim, Omer Reingold, and Guy N Rothblum. Calibration for the (computationally-identifiable) masses. arXiv preprint arXiv:1711.08513, 2017. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13–30, 1963. Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth. Fairness in reinforcement learning. In International Conference on Machine Learning, pages 1617–1626, 2017. Prateek Jain, Brian Kulis, Inderjit S Dhillon, and Kristen Grauman. Online metric learning and fast similarity search. In Advances in neural information processing systems, pages 761–768, 2009. Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. Fairness in learning: Classic and contextual bandits. pages 325–333, 2016a. 22 Matthew Joseph, Michael J. Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. Fair algorithms for infinite and contextual bandits. CoRR, abs/1610.09559, 2016b. URL http://arxiv.org/abs/1610.09559. Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei StevenWu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144, 2017. Michael P Kim, Omer Reingold, and Guy N Rothblum. Fairness through computationallybounded awareness. arXiv preprint arXiv:1803.03239, 2018. Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In Proceedings of the 2017 ACM Conference on Innovations in Theoretical Computer Science, Berkeley, CA, USA, 2017, 2017. Brian Kulis et al. Metric learning: A survey. Foundations and Trends® in Machine Learning, 5(4): 287–364, 2013. Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, and David C Parkes. Calibrated fairness in bandits. arXiv preprint arXiv:1707.01875, 2017. Ilan Lobel, Renato Paes Leme, and Adrian Vladu. Multidimensional binary search for contextual decision-making. In Proceedings of the 2017 ACM Conference on Economics and Computation, EC ’17, Cambridge, MA, USA, June 26-30, 2017, page 585, 2017. doi: 10.1145/3033274.3085100. URL http://doi.acm.org/10.1145/3033274.3085100. Guy N Rothblum and Gal Yona. Probably approximately metric-fair learning. arXiv preprint arXiv:1803.03242, 2018. Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171–1180. InternationalWorldWide Web Conferences Steering Committee, 2017. Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.

#38 Oversampling for Imbalanced Data via Optimal Transport

ADASYN DANGER MWMOTE GRF HKRGC LIBSVM1, https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets

Optimal Transport for OverSampling (OTOS) true positive (TP) false positive (FP) false negative (FN) true negative (TN)

References Abdi, L., and Hashemi, S. 2016. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Transactions on Knowledge and Data Engineering 28(1):238–251. Arjovsky, M.; Chintala, S.; and Bottou, L. 2017. Wasserstein generative adversarial networks. In ICML, 214–223. Barua, S.; Islam, M. M.; Yao, X.; and Murase, K. 2014. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering 26(2):405–425. Bellinger, C.; Drummond, C.; and Japkowicz, N. 2018. Manifoldbased synthetic oversampling with manifold conformance estimation. Machine Learning 107(3):605–637. Benamou, J.-D.; Carlier, G.; Cuturi, M.; Nenna, L.; and Peyr´e, G. 2015. Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing 37(2):A1111– A1138. Bhattacharya, S.; Rajan, V.; and Shrivastava, H. 2017. Icu mortality prediction: A classification algorithm for imbalanced datasets. In AAAI, 1288–1294. Branco, P.; Torgo, L.; and Ribeiro, R. P. 2016. A survey of predictive modeling on imbalanced domains. ACM Computing Surveys 49(2):31. Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; and Kegelmeyer, W. P. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357. Courty, N.; Flamary, R.; Habrard, A.; and Rakotomamonjy, A. 2017a. Joint distribution optimal transportation for domain adaptation. In NIPS, 3733–3742. Courty, N.; Flamary, R.; Tuia, D.; and Rakotomamonjy, A. 2017b. Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(9):1853–1865. Cuturi, M., and Doucet, A. 2014. Fast computation of wasserstein barycenters. In ICML, 685–693. Cuturi, M. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS, 2292–2300. Das, B.; Krishnan, N. C.; and Cook, D. J. 2015. Racog and wracog: Two probabilistic oversampling techniques. IEEE Transactions on Knowledge and Data Dngineering 27(1):222. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In CVPR, 248–255. Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; and Lin, C.- J. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9(Aug):1871–1874. Fern´andez, A.; Garcia, S.; Herrera, F.; and Chawla, N. V. 2018. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research 61:863–905. Gonz´alez, S.; Garc´ıa, S.; Li, S.-T.; and Herrera, F. 2019. Chain based sampling for monotonic imbalanced classification. Information Sciences 474:187–204. Han, H.; Wang,W.-Y.; and Mao, B.-H. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In ICIC, 878–887. He, H., and Garcia, E. A. 2008. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. He, H.; Bai, Y.; Garcia, E. A.; and Li, S. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IJCNN, 1322–1328. He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778. Kantorovitch, L. 1958. On the translocation of masses. Management Science 5(1):1–4. Lemaˆıtre, G.; Nogueira, F.; and Aridas, C. K. 2017. Imbalancedlearn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research 18(1):559–563. Lin, C.-T.; Hsieh, T.-Y.; Liu, Y.-T.; Lin, Y.-Y.; Fang, C.-N.; Wang, Y.-K.; Yen, G.; Pal, N. R.; and Chuang, C.-H. 2018. Minority oversampling in kernel adaptive subspaces for class imbalanced datasets. IEEE Transactions on Knowledge and Data Engineering. Liu, M.; Xu, C.; Luo, Y.; Xu, C.;Wen, Y.; and Tao, D. 2017. Costsensitive feature selection via f-measure optimization reduction. In AAAI, 2252–2258. Monge, G. 1781. M´emoire sur la th´eorie des d´eblais et des remblais. Histoire de l’Acad´emie Royale des Sciences de Paris. Peng, Y. 2015. Adaptive sampling with optimal cost for classimbalance learning. In AAAI, volume 15, 2921–2927. P´erez-Ortiz, M.; Guti´errez, P. A.; Tino, P.; and Herv´as-Mart´ınez, C. 2016. Oversampling the minority class in the feature space. IEEE Transactions on Neural Networks and Learning Systems 27(9):1947–1961. Peyr´e, G., and Cuturi, M. 2017. Computational optimal transport. Peyr´e, G.; Cuturi, M.; and Solomon, J. 2016. Gromov-wasserstein averaging of kernel and distance matrices. In ICML, 2664–2672. Sen, A.; Islam, M. M.; Murase, K.; and Yao, X. 2016. Binarization with boosting and oversampling for multiclass classification. IEEE Transactions on Cybernetics 46(5):1078–1091. Sinkhorn, R. 1967. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly 74(4):402–405. Villani, C. 2008. Optimal transport: old and new, volume 338. Springer Science & Business Media. Yan, Y.; Li,W.;Wu, H.; Min, H.; Tan, M.; andWu, Q. 2018. Semisupervised optimal transport for heterogeneous domain adaptation. In IJCAI, 737–753. Zhang, Y.; Zhao, P.; Cao, J.; Ma, W.; Huang, J.; Wu, Q.; and Tan, M. 2018. Online adaptive asymmetric active learning for budgeted imbalanced data. In SIGKDD, 2768–2777.

#39 Practical Black-Box Attacks against Machine Learning

API

REFERENCES [1] Marco Barreno, et al. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security. [2] Battista Biggio, et al. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases, pages 387{402. Springer, 2013. [3] Ian Goodfellow, et al. Deep learning. Book in preparation for MIT Press (www.deeplearningbook.org), 2016. [4] Ian J Goodfellow, et al. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations, 2015. [5] Ling Huang, et al. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and arti cial intelligence, pages 43{58, 2011. [6] Alexey Kurakin, et al. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016. [7] Yann LeCun et al. The mnist database of handwritten digits, 1998. [8] Erich L. Lehmann, et al. Testing Statistical Hypotheses. Springer Texts in Statistics, August 2008. [9] Nicolas Papernot, et al. The limitations of deep learning in adversarial settings. In Proceedings of the 1st IEEE European Symposium on Security and Privacy, 2016. [10] Nicolas Papernot, et al. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 37th IEEE Symposium on Security and Privacy. [11] Mahmood Sharif, et al. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016. [12] Nedim Srndic, et al. Practical evasion of a learning-based classi er: A case study. In Proceeding of the 35th IEEE Symposium on Security and Privacy. [13] Johannes Stallkamp, et al. Man vs. computer: Benchmarking machine learning algorithms for tra�c sign recognition. Neural networks, 32:323{332, 2012. [14] Christian Szegedy, et al. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations, 2014. [15] Florian Tram�er, et al. Stealing machine learning models via prediction apis. In 25th USENIX Security Symposium, 2016. [16] Je�rey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 1985. [17] D Warde-Farley, et al. Adversarial perturbations of deep neural networks. Advanced Structured Prediction, 2016. [18] Weilin Xu, et al. Automatically evading classi ers. In Proceedings of the 2016 Network and Distributed Systems Symposium.

#40 Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer

decision-maker’s (DM’s) minimum subgroup accuracy (MSA)

References [1] Josh Attenberg, Panagiotis G Ipeirotis, and Foster J Provost. Beat the machine: Challenging workers to find the unknown unknowns. Human Computation, 11(11), 2011. [2] Yahav Bechavod and Katrina Ligett. Learning Fair Classifiers: A Regularization- Inspired Approach. Workshop on Fairness, Accountability, and Transparency in Machine Learning, June 2017. arXiv: 1707.00044.

[3] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1613–1622, Lille, France, 07–09 Jul 2015. PMLR. URL http://proceedings.mlr.press/v37/blundell15.html. [4] Amanda Bower, Sarah N. Kitchen, Laura Niss, Martin J. Strauss, Alexander Vargas, and Suresh Venkatasubramanian. Fair Pipelines. Workshop on Fairness, Accountability, and Transparency in Machine Learning, July 2017. arXiv: 1707.00391. [5] Jenna Burrell. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1):2053951715622512, 2016. [6] Lowell W Busenitz and Jay B Barney. Differences between entrepreneurs and managers in large organizations: Biases and heuristics in strategic decision-making. Journal of business venturing, 12(1):9–30, 1997. [7] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, 2017. [8] C. Chow. An optimum character recognition system using decision function. IEEE T. C., 1957. [9] C. Chow. On optimum recognition error and reject trade-off. IEEE T. C., 1970. [10] Corinna Cortes, Giulia DeSalvo, and Mehryar Mohri. Learning with rejection. In International Conference on Algorithmic Learning Theory, pages 67–82. Springer, 2016. [11] Shai Danziger, Jonathan Levav, and Liora Avnaim-Pesso. Extraneous factors in judicial decisions. Proceedings of the National Academy of Sciences, 108(17):6889–6892, 2011. [12] Robyn M Dawes, David Faust, and Paul E Meehl. Clinical versus actuarial judgment. Science, 243(4899):1668–1674, 1989. [13] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012. [14] Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, 2017. [15] Lydia Fischer and Thomas Villmann. A probabilistic classifier model with adaptive rejection option. Technical Report 1865-3960, January 2016. URL https://www. techfak.uni-bielefeld.de/~fschleif/mlr/mlr_01_2016.pdf. [16] Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P. Gummadi, and Adrian Weller. On Fairness, Diversity and Randomness in Algorithmic Decision Making. Workshop on Fairness, Accountability, and Transparency in Machine Learning, June 2017. arXiv: 1706.10208.

[17] Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1321–1330, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/guo17a.html. [18] Dylan Hadfield-Menell, Stuart J Russell, Pieter Abbeel, and Anca Dragan. Cooperative inverse reinforcement learning. In Advances in neural information processing systems, pages 3909–3917, 2016. [19] Kelly Hannah-Moffat. Actuarial sentencing: An “unsettled” proposition. Justice Quarterly, 30(2):270–296, 2013. [20] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016. [21] Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991. [22] Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbelsoftmax. International Conference on Learning Representations, 2016. [23] Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, pages 325–333, 2016. [24] F. Kamiran and T. Calders. Classifying without discriminating. In 2nd International Conference on Computer, Control and Communication, 2009. IC4 2009, pages 1–6, February 2009. doi: 10.1109/IC4.2009.4909197. [25] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. Machine Learning and Knowledge Discovery in Databases, pages 35–50, 2012. [26] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 2014. [27] Lauren Kirchner and Jeff Larson. How we analyzed the compas recidivism algorithm. 2016. URL https://www.propublica.org/article/ how-we-analyzed-the-compas-recidivism-algorithm. [28] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent Trade-Offs in the Fair Determination of Risk Scores. Innovations in Theoretical Computer Science Conference, September 2016. arXiv: 1609.05807. [29] Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. International Conference on Learning Representations, 2017.

[30] Aditya Krishna Menon and Robert C Williamson. The cost of fairness in binary classification. In Conference on Fairness, Accountability and Transparency, pages 107– 118, 2018. [31] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems, pages 5680–5689, 2017. [32] Kush R Varshney and Homa Alemzadeh. On the safety of machine learning: Cyberphysical systems, decision sciences, and data products. Big data, 5(3):246–255, 2017. [33] Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, and Joseph E. Gonzalez. IDK Cascades: Fast Deep Learning by Learning not to Overthink. Conference on Uncertainty in Artificial Intelligence, June 2017. arXiv: 1706.00885. [34] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171–1180. International World Wide Web Conferences Steering Committee, 2017. [35] Richard Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning Fair Representations. In PMLR, pages 325–333, February 2013. URL http://proceedings. mlr.press/v28/zemel13.html.

#41 PyTorch: An Imperative Style, High-Performance Deep Learning Library

APL CNTK NumPy SciPy Pandas FIFO CPU

References [1] Yangqing "Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor" Darrell. "caffe: Convolutional architecture for fast feature embedding". "arXiv preprint arXiv:1408.5093", "2014". [2] Frank Seide and Amit Agarwal. Cntk: Microsoft’s open-source deep-learning toolkit. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 2135–2135, New York, NY, USA, 2016. ACM. 9 [3] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, MartinWattenberg, MartinWicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Largescale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. [4] Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016. [5] Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. Chainer: a next-generation open source framework for deep learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015. [6] Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. Torch: a modular machine learning software library. Technical report, Idiap, 2002. [7] G. Neubig, C. Dyer, Y. Goldberg, A. Matthews, W. Ammar, A. Anastasopoulos, M. Ballesteros, D. Chiang, D. Clothiaux, T. Cohn, K. Duh, M. Faruqui, C. Gan, D. Garrette, Y. Ji, L. Kong, A. Kuncoro, G. Kumar, C. Malaviya, P. Michel, Y. Oda, M. Richardson, N. Saphra, S. Swayamdipta, and P. Yin. DyNet: The Dynamic Neural Network Toolkit. ArXiv e-prints, January 2017. [8] Philip S. Abrams. An APL Machine. PhD thesis, Stanford University, 1970. [9] The MathWorks, Inc., Natick, Massachusetts, United States. MATLAB and Statistics Toolbox. [10] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [11] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98, 2017. [12] Travis Oliphant. NumPy: A guide to NumPy. USA: Trelgol Publishing, 2006. http://www.numpy.org/. [13] Gaël Guennebaud, Benoît Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010. [14] Y LeCun and L Bottou. Lush reference manual. Technical report, code available at http://lush.sourceforge.net, 2002. [15] Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: A survey. J. Mach. Learn. Res., 18(1):5595–5637, January 2017. [16] Dougal Maclaurin. Modeling, Inference and Optimization with Composable Differentiable Procedures. PhD thesis, Harvard University, April 2016. [17] Matthew Johnson et. al. Jax. https://github.com/google/jax, 2018. [18] Mike Innes et. al. Flux.jl. https://github.com/FluxML/Flux.jl, 2018. [19] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. http://www.scipy.org/. [20] Wes McKinney. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, 51-56, 2010. [21] Pierre Sermanet, Koray Kavukcuoglu, and Yann LeCun. Eblearn: Open-source energy-based learning in c++. In 2009 21st IEEE International Conference on Tools with Artificial Intelligence, pages 693–697. IEEE, 2009. 10 [22] Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan D. Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cudnn: Efficient primitives for deep learning. CoRR, abs/1410.0759, 2014. [23] Andrew Lavin. maxdnn: An efficient convolution kernel for deep learning with maxwell gpus, January 2015. [24] Andrew Lavin and Scott Gray. Fast algorithms for convolutional neural networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4013–4021, 2016. [25] Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. Torch7: A matlab-like environment for machine learning. In NIPS 2011, 2011. [26] Richard Gabriel. The rise of worse is better. http://dreamsongs.com/RiseOfWorseIsBetter.html. [27] Yann LeCun and Corinna Cortes. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. [28] Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy P. Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, and Rodney Tsing. Starcraft II: A new challenge for reinforcement learning. CoRR, abs/1708.04782, 2017. [29] DMLC. Dlpack: Open in memory tensor structure. https://github.com/dmlc/dlpack. [30] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS Workshop, 2017. [31] Dan Piponi. Automatic differentiation, C++ templates, and photogrammetry. J. Graphics, GPU, & Game Tools, 9(4):41–55, 2004. [32] Holger Leuck and Hans-Hellmut Nagel. Automatic differentiation facilitates of-integration into steering-angle-based road vehicle tracking. In 1999 Conference on Computer Vision and Pattern Recognition (CVPR ’99), 23-25 June 1999, Ft. Collins, CO, USA, pages 2360–2365, 1999. [33] The Python team. The cpython global interpreter lock. https://wiki.python.org/moin/GlobalInterpreterLock. [34] Giovanni Petrantoni and Jörg Wollenschläger. Nimtorch. https://github.com/fragcolorxyz/ nimtorch. [35] Austin Huang, Junji Hashimoto, and Sam Stites. Hasktorch. https://github.com/hasktorch/hasktorch. [36] G. Synnaeve, Z. Lin, J. Gehring, D. Gant, V. Mella, V. Khalidov, N. Carion, and N. Usunier. Forward modeling for partial observation strategy games - a starcraft defogger. In Advances in Neural Information Processing Systems, pages 10761–10771, 2018. [37] The PyTorch team. Torch Script. https://pytorch.org/docs/stable/jit.html. [38] Justin Luitjens. Cuda streams. GPU technology conference, 2014. [39] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, pages 117–128, New York, NY, USA, 2000. ACM. [40] J. Evans. A scalable concurrent malloc(3) implementation for freebsd. In In BSDCan — The Technical BSD Conference, May 2006. [41] S. Ghemawat and P. Menage. Tcmalloc: Thread-caching malloc. 11 [42] Benjamin Recht, Christopher Ré, Stephen J. Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain., pages 693–701, 2011. [43] Matthew Hertz and Emery D. Berger. Quantifying the performance of garbage collection vs. explicit memory management. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’05, pages 313–326, New York, NY, USA, 2005. ACM. [44] The PyTorch team. Pytorch Autograd Profiler. https://pytorch.org/docs/1.0.1/autograd.html#profiler.

#42 Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

References [1] Dana Angluin and Philip Laird. Learning From Noisy Examples. Machine Learning, 2(4):343– 370, Apr 1988. [2] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. ProPublica, May, 23:2016, 2016. [3] Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94(4):991–1013, 2004. [4] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems, pages 4349–4357, 2016. [5] Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Conference on Fairness, Accountability and Transparency, pages 77–91, 2018. [6] Alexandra Chouldechova. Fair PredictionWith Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2):153–163, 2017. 19 [7] Danielle Keats Citron and Frank Pasquale. The Scored Society: Due Process for Automated Predictions. Wash. L. Rev., 89:1, 2014. [8] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797–806. ACM, 2017. [9] Maria De-Arteaga, Artur Dubrawski, and Alexandra Chouldechova. Learning under selective labels in the presence of expert consistency. arXiv preprint arXiv:1807.00905, 2018. [10] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard S. Zemel. Fairness Through Awareness. In Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, January 8-10, 2012, pages 214–226, 2012. [11] Anthony W Flores, Kristin Bechtel, and Christopher T Lowenkamp. False Positives, False Negatives, and False Analyses: A Rejoinder to Machine Bias: There’s Software Used across the Country to Predict Future Criminals. And It’s Biased against Blacks. Fed. Probation, 80:38, 2016. [12] Sorelle A. Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. On the (im)possibility of fairness. CoRR, abs/1609.07236, 2016. [13] Moritz Hardt, Eric Price, and Nati Srebro. Equality of Opportunity in Supervised Learning. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3315–3323. Curran Associates, Inc., 2016. [14] Heinrich Jiang and Ofir Nachum. Identifying and Correcting Label Bias in Machine Learning. CoRR, abs/1901.04966, 2019. [15] Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent Trade-Offs in the Fair Determination of Risk Scores. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, pages 43:1–43:23, 2017. [16] Jon M. Kleinberg and Manish Raghavan. Selection Problems in the Presence of Implicit Bias. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 33:1–33:17, 2018.

[17] Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 275–284. ACM, 2017.

[18] Kristian Lum and William Isaac. To predict and serve? Significance, 13(5):14–19, 2016.

[19] Geoff Pleiss, Manish Raghavan, FelixWu, Jon Kleinberg, and Kilian QWeinberger. On Fairness and Calibration. In Advances in Neural Information Processing Systems, pages 5680–5689, 2017.

[20] Rashida Richardson, Jason Schultz, and Kate Crawford. Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice. New York University Law Review Online, Forthcoming, 2019.

[21] Samuel Yeom and Michael Carl Tschantz. Discriminative but Not Discriminatory: A Comparison of Fairness Definitions under Different Worldviews. arXiv preprint arXiv:1808.08619, 2018.

#43 SphereFace: Deep Hypersphere Embedding for Face Recognition

Labeled Face in the Wild (LFW)

References [1] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005. 3 [2] C. Ding and D. Tao. Robust face recognition via multimodal deep face representation. IEEE TMM, 17(11):2049–2058, 2015. 7 [3] R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality reduction by learning an invariant mapping. In CVPR, 2006. 2, 3 [4] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 6 [5] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang. Face recognition using laplacianfaces. TPAMI, 27(3):328–340, 2005. 2 [6] E. Hoffer and N. Ailon. Deep metric learning using triplet network. arXiv preprint:1412.6622, 2014. 3 [7] J. Hu, J. Lu, and Y.-P. Tan. Discriminative deep metric learning for face verification in the wild. In CVPR, 2014. 3 [8] G. B. Huang and E. Learned-Miller. Labeled faces in the wild: Updates and new reporting procedures. Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Tech. Rep, pages 14–003, 2014. 1, 7 [9] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report, 2007. 7 [10] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint:1408.5093, 2014. 6 [11] I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard. The megaface benchmark: 1 million faces for recognition at scale. In CVPR, 2016. 1 [12] M. Köstinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof. Large scale metric learning from equivalence constraints. In CVPR, 2012. 3 [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1 [14] K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman. Video-based face recognition using probabilistic appearance manifolds. In CVPR, 2003. 2 [15] J. Liu, Y. Deng, and C. Huang. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv preprint:1506.07310, 2015. 3, 7 [16] W. Liu, Y. Wen, Z. Yu, and M. Yang. Large-margin softmax loss for convolutional neural networks. In ICML, 2016. 2, 3, 7, 8, 10, 11, 12 [17] J. Lu, G. Wang, W. Deng, P. Moulin, and J. Zhou. Multimanifold deep metric learning for image set classification. In CVPR, 2015. 3 [18] D. Miller, E. Brossard, S. Seitz, and I. Kemelmacher- Shlizerman. Megaface: A million faces for recognition at scale. arXiv preprint:1505.02108, 2015. 8 [19] H.-W. Ng and S. Winkler. A data-driven approach to cleaning large face datasets. In ICIP, 2014. 8 [20] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In BMVC, 2015. 6, 7 [21] A. Ross and A. K. Jain. Multimodal biometrics: An overview. In Signal Processing Conference, 2004 12th European, pages 1221–1224. IEEE, 2004. 1 [22] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015. 1, 2, 3, 6, 7, 8 [23] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint:1409.1556, 2014. 1 [24] H. O. Song, Y. Xiang, S. Jegelka, and S. Savarese. Deep metric learning via lifted structured feature embedding. In CVPR, 2016. 3 [25] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In NIPS, 2014. 1, 2, 3 [26] Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In CVPR, 2014. 2, 3, 7, 8 [27] Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. In CVPR, 2015. 7 [28] Y. Sun, X. Wang, and X. Tang. Sparsifying neural network connections for face recognition. In CVPR, 2016. 2, 3 [29] C. Szegedy,W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015. 1 [30] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014. 2, 3, 6, 7 [31] A. Talwalkar, S. Kumar, and H. Rowley. Large-scale manifold learning. In CVPR, 2008. 2 [32] J.Wang, Y. Song, T. Leung, C. Rosenberg, J.Wang, J. Philbin, B. Chen, and Y. Wu. Learning fine-grained image similarity with deep ranking. In CVPR, 2014. 3 [33] K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(Feb):207–244, 2009. 3 [34] Y. Wen, K. Zhang, Z. Li, and Y. Qiao. A discriminative feature learning approach for deep face recognition. In ECCV, 2016. 1, 2, 3, 7, 8 [35] L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In CVPR, 2011. 7 [36] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with sideinformation. NIPS, 2003. 3 [37] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint:1411.7923, 2014. 2, 6, 7 [38] Y. Ying and P. Li. Distance metric learning with eigenvalue optimization. JMLR, 13(Jan):1–26, 2012. 3 [39] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multi-task cascaded convolutional networks. arXiv preprint:1604.02878, 2016. 5

#44 Striking the Right Balance with Uncertainty

CFP, Celebrities in Frontal Profile Balanced Classification Accuracy (BCA) uncertainty-based margin (UMM) sample-level uncertainty modeling (SUM).

References [1] R. Akbani, S. Kwek, and N. Japkowicz. Applying support vector machines to imbalanced datasets. In European conference on machine learning, pages 39–50. Springer, 2004. [2] L. Ballerini, R. B. Fisher, B. Aldridge, and J. Rees. Nonmelanoma skin lesion classification using colour image data in a hierarchical k-nn classifier. In Biomedical Imaging (ISBI), 2012 9th IEEE International Symposium on, pages 358–361. IEEE, 2012. [3] L. Ballerini, R. B. Fisher, B. Aldridge, and J. Rees. A color and texture based hierarchical k-nn approach to the classification of non-melanoma skin lesions. In Color Medical Image Analysis, pages 63–86. Springer, 2013. [4] P. Cao, D. Zhao, and O. Zaiane. An optimized cost-sensitive svm for imbalanced data learning. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 280– 292. Springer, 2013. [5] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In International Conference on Automatic Face and Gesture Recognition, 2018. [6] J.-R. Chang and Y.-S. Chen. Batch-normalized maxout network in network. arXiv preprint arXiv:1511.02583, 2015. [7] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321– 357, 2002. [8] B. Chen, W. Deng, and J. Du. Noisy softmax: Improving the generalization ability of dcnn via postponing the early softmax saturation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5372– 5381, 2017. [9] J. Chen, C.-A. Tsai, H. Moon, H. Ahn, J. Young, and C.- H. Chen. Decision threshold adjustment in class prediction. SAR and QSAR in Environmental Research, 17(3):337–352, 2006. [10] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie. Classbalanced loss based on effective number of samples. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [11] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009. [12] J. Deng, J. Guo, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698, 2018. [13] J. Deng, Y. Zhou, and S. Zafeiriou. Marginal loss for deep face recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2006–2014. IEEE, 2017. [14] C. Drummond and R. C. Holte. C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. ICML Workshops, 2003. [15] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017. [16] Y. Gal and Z. Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050– 1059, 2016. [17] M. Hayat, S. Khan, W. Zamir, J. Shen, and L. Shao. Maxmargin class imbalanced learning with gaussian affinity. arXiv preprint arXiv:1901.07711, 2019. [18] H. He, Y. Bai, E. A. Garcia, and S. Li. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pages 1322–1328. IEEE, 2008. [19] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009. [20] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507, 7, 2017. [21] C. Huang, Y. Li, C. Change Loy, and X. Tang. Learning deep representation for imbalanced classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5375–5384, 2016. [22] P. Jeatrakul, K. W. Wong, and C. C. Fung. Classification of imbalanced data by combining the complementary neural network and smote algorithm. In International Conference on Neural Information Processing, pages 152–159. Springer, 2010. [23] A. Kendall, Y. Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [24] S. Khan, H. Rahmani, S. A. A. Shah, and M. Bennamoun. A guide to convolutional neural networks for computer vision. Synthesis Lectures on Computer Vision, 8(1):1–207, 2018. [25] S. H. Khan, M. Hayat, M. Bennamoun, F. Sohel, and R. Togneri. Cost sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 2017. [26] J. Laurikkala. Improving identification of difficult small classes by balancing class distribution. In Conference on Artificial Intelligence in Medicine in Europe, pages 63–66. Springer, 2001. [27] G. B. H. E. Learned-Miller. Labeled faces in the wild: Updates and new reporting procedures. Technical Report UMCS- 2014-003, University of Massachusetts, Amherst, May 2014. [28] C.-Y. Lee, P. W. Gallagher, and Z. Tu. Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 464–472, 2016. [29] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeplysupervised nets. 2015. [30] J. Liu, Y. Deng, T. Bai, Z.Wei, and C. Huang. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv preprint arXiv:1506.07310, 2015. [31] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song. Sphereface: Deep hypersphere embedding for face recognition. [32] W. Liu, Y. Wen, Z. Yu, and M. Yang. Large-margin softmax loss for convolutional neural networks. In International Conference on Machine Learning, pages 507–516, 2016. [33] Z. Liu, P. Luo, X.Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730–3738, 2015. [34] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 2015. [35] I. Mani and I. Zhang. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, volume 126, 2003. [36] I. Masi, A. T. Trn, T. Hassner, J. T. Leksut, and G. Medioni. Do we really need to collect millions of faces for effective face recognition? In European Conference on Computer Vision, pages 579–596. Springer, 2016. [37] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou. Agedb: the first manually collected, inthe- wild age database. In Proceedings of IEEE Intl Conf. on Computer Vision and Pattern Recognition (CVPR-W 2017), Honolulu, Hawaii, June 2017. [38] S. Rahman, S. Khan, and F. Porikli. Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. arXiv preprint arXiv:1803.06049, 2018. [39] C. E. Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63–71. Springer, 2004. [40] M. Ren, W. Zeng, B. Yang, and R. Urtasun. Learning to reweight examples for robust deep learning. In International Conference on Machine Learning, 2018. [41] C. C. V. P. R. C. D. J. S. Sengupta, J.C. Cheng. Frontal to profile face verification in the wild. In IEEE Conference on Applications of Computer Vision, February 2016. [42] J. A. S´aez, J. Luengo, J. Stefanowski, and F. Herrera. Smote– ipf: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences, 291:184–203, 2015. [43] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015. [44] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), 2015. [45] M. R. Smith, T. Martinez, and C. Giraud-Carrier. An instance level analysis of data complexity. Machine learning, 95(2):225–256, 2014. [46] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014. [47] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Advances in neural information processing systems, pages 1988–1996, 2014. [48] Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2892–2900, 2015. [49] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1701–1708, 2014. [50] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Webscale training for face identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2746–2754, 2015. [51] C. Tzelepis. Maximum Margin Learning Under Uncertainty. PhD thesis, Queen Mary University of London, 2018. [52] C. Tzelepis, V. Mezaris, and I. Patras. Linear maximum margin classifier for learning from uncertain data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. [53] B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Class imbalance, redux. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 754– 763. IEEE, 2011. [54] F. Wang, J. Cheng, W. Liu, and H. Liu. Additive margin softmax for face verification. IEEE Signal Processing Letters, 25(7):926–930, 2018. [55] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, andW. Liu. Cosface: Large margin cosine loss for deep face recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [56] Y. Wen, K. Zhang, Z. Li, and Y. Qiao. A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision, pages 499–515. Springer, 2016. [57] L.Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 529–534. IEEE, 2011. [58] Y. Wu, H. Liu, J. Li, and Y. Fu. Deep face recognition with center invariant loss. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pages 408–414. ACM, 2017. [59] S.-J. Yen and Y.-S. Lee. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36(3):5718–5727, 2009. [60] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker. Feature transfer learning for deep face recognition with long-tail data. arXiv preprint arXiv:1803.09014, 2018. [61] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503, 2016. [62] N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev. Panda: Pose aligned networks for deep attribute modeling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1644, 2014. [63] X. Zhang, Z. Fang, Y. Wen, Z. Li, and Y. Qiao. Range loss for deep face recognition with long-tailed training data. In Proceedings of the IEEE International Conference on Computer Vision, pages 5409–5418, 2017.

#45 The Cost of Fairness in Binary Classification

References Maria-Florina Balcan and Avrim Blum. A discriminative model for semi-supervised learning. Journal of the ACM, 57(3):19:1–19:46, March 2010. Lynwood Bryant. The role of thermodynamics in the evolution of heat engines. Technology and Culture, 14(2):152–165, 1973. Toon Calders and Sicco Verwer. Three Naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21(2): 277–292, 2010. Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. CoRR, abs/1701.08230, 2017. URL http://arxiv. org/abs/1701.08230. Luc Devroye, László Györfi, and Gábor Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1996. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Innovations in Theoretical Computer Science Conference (ITCS), pages 214–226, 2012. EEOC. Uniform guidelines on employee selection procedures. https://www.eeoc.gov/policy/ docs/qanda_clarify_procedures.html, 1979. Charles Elkan. The foundations of cost-sensitive learning. In International joint conference on Artificial Intelligence (IJCAI), pages 973–978, 2001. Tom Fawcett. An introduction to roc analysis. Pattern Recognition Letters, 27(8):861–874, June 2006. ISSN 0167-8655. Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 259–268, 2015. Kazuto Fukuchi, Jun Sakuma, and Toshihiro Kamishima. Prediction with model-based neutrality. In European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pages 499–514, 2013. James Gleick. The Information: A History, a Theory, a Flood. Fourth Estate, 2011. Moritz Hardt, Eric Price, andNathan Srebro. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems (NIPS), December 2016. John C. Harsanyi. Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility. The Journal of Political Economy, 63(4):309–321, 1955. Faisal Kamiran and Toon Calders. Classification without discrimination. In IEEE International Conference on Computer, Control and Communication (IEEE-IC4), 2009. Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. Discrimination aware decision tree learning. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages 869–874. IEEE, 2010. Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. In European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pages 35–50, 2012. Harikrishna Narasimhan, Rohit Vaish, and Shivani Agarwal. On the statistical consistency of plugin classifiers for non-decomposable performance measures. In Advances in Neural Information Processing Systems (NIPS), pages 1493–1501, 2014. Harikrishna Narasimhan, Purushottam Kar, and Prateek Jain. Optimizing non-decomposable performance measures: A tale of two classes. In International Conference on Machine Learning (ICML), pages 199–208, 2015. 11 The Cost of Fairness in Binary Classification Shameem Puthiya Parambath, Nicolas Usunier, and Yves Grandvalet. Optimizing F-measures by costsensitive classification. In Advances in Neural Information Processing Systems (NIPS), pages 2123– 2131, 2014. Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. Discrimination-aware data mining. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 560– 568, 2008. John Rawls. A Theory of Justice. Harvard University Press, 1971. Mark D. Reid and Robert C. Williamson. Surrogate regret bounds for proper losses. In International Conference on Machine Learning (ICML), pages 897–904, 2009. Amartya K. Sen. The Idea of Justice. Harvard University Press, 2009. Blake E. Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, and Nathan Srebro. Learning nondiscriminatory predictors. In Conference on Learning Theory (COLT), pages 1920–1953, 2017. Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In International World Wide Web Conference (WWW), 2017. Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna Gummadi. Learning fair classifiers. arXiv preprint arXiv:1507.05259, 2016. Indr˙e Žliobait˙e. On the relation between accuracy and fairness in binary classification. In Workshop on Fairness, Accountability, and Transparency in Machine Learning (FATML), 2015. Indr˙e Žliobait˙e. Measuring discrimination in algorithmic decision making. Data Mining and Knowledge Discovery, 31(1060–1089), 2017. Indr˙e Žliobait˙e, Faisal Kamiran, and Toon Calders. Handling conditional discrimination. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 992–1001. IEEE, 2011.

#46 The Limitations of Deep Learning in Adversarial Settings

SPAM

REFERENCES [1] E. G. Amoroso. Fundamentals of Computer Security Technology. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1994. [2] M. Barreno, B. Nelson, A. D. Joseph, and J. Tygar. The security of machine learning. Machine Learning, 81(2):121–148, 2010. [3] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16–25. ACM, 2006. [4] Y. Bengio. Learning deep architectures for AI. Foundations and trends in Machine Learning, 2(1):1–127, 2009. [5] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a cpu and gpu math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy), volume 4, page 3. Austin, TX, 2010. [6] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. ˇ Srndi´c, P. Laskov, G. Giacinto, and F. Roli. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases, pages 387–402. Springer, 2013. [7] B. Biggio, G. Fumera, and F. Roli. Pattern recognition systems under attack: Design issues and research challenges. International Journal of Pattern Recognition and Artificial Intelligence, 28(07):1460002, 2014. [8] B. Biggio, G. Fumera, and F. Roli. Security evaluation of pattern classifiers under attack. Knowledge and Data Engineering, IEEE Transactions on, 26(4):984–996, 2014. [9] B. Biggio, B. Nelson, and P. Laskov. Support vector machines under adversarial label noise. In ACML, pages 97–112, 2011. [10] B. Biggio, B. Nelson, and L. Pavel. Poisoning attacks against support vector machines. In Proceedings of the 29th International Conference on Machine Learning, 2012. [11] B. Biggio, K. Rieck, D. Ariu, C. Wressnegger, I. Corona, G. Giacinto, and F. Roli. Poisoning behavioral malware clustering. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pages 27–36. ACM, 2014. [12] D. Cires¸an, U. Meier, J. Masci, et al. Multi-column deep neural network for traffic sign classification. Neural Networks, 32:333–338, 2012. [13] R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with task learning. In Proceedings of the 25th international conference on Machine learning, pages 160–167. ACM, 2008. [14] G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu. Large-scale malware classification using random projections and neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 3422–3426. IEEE, 2013. [15] G. E. Dahl, D. Yu, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1):30–42, 2012. [16] P. Fogla and W. Lee. Evading network anomaly detection systems: formal reasoning and practical techniques. In Proceedings of the 13th ACM conference on Computer and communications security, pages 59– 68. ACM, 2006. [17] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014. [18] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In Proceedings of the 2015 International Conference on Learning Representations. Computational and Biological Learning Society, 2015. [19] S. Gu and L. Rigazio. Towards deep neural network architectures robust to adversarial examples. In Proceedings of the 2015 International Conference on Learning Representations. Computational and Biological Learning Society, 2015. [20] G. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006. [21] K. Hornik, M. Stinchcombe, et al. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989. [22] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. Tygar. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence, pages 43–58. ACM, 2011. [23] E. Knorr. How paypal beats the bad guys with machine learning. http://www.infoworld.com/article/2907877/machine-learning/howpaypal- reduces-fraud-with-machine-learning.html, 2015. [24] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [25] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin. Exploring strategies for training deep neural networks. The Journal of Machine Learning Research, 10:1–40, 2009. [26] Y. LeCun, L. Bottou, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [27] Y. LeCun and C. Cortes. The mnist database of handwritten digits, 1998. [28] LISA lab. http://deeplearning.net/tutorial/lenet.html, 2010. [29] K. P. Murphy. Machine learning: a probabilistic perspective. MIT 2012. [30] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In In Computer Vision and Pattern Recognition (CVPR 2015). IEEE, 2015. [31] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Cognitive modeling, 5, 1988. [32] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak. Convolutional, long short-term memory, fully connected deep neural networks. 2015. [33] H. Sak, A. Senior, and F. Beaufays. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), 2014. [34] K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. [35] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014. [36] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In Proceedings of the 2014 International Conference on Learning Representations. Computational and Biological Learning Society, 2014. [37] Y. Taigman, M. Yang, et al. Deepface: Closing the gap to human-level performance in face verification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1701–1708. IEEE, 2014. [38] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, pages 3320–3328, 2014.

#47 Trainable Undersampling for Class-Imbalance Learning

AUCPRC ALLKNN

area under the curve (AUC) Diabetic Retinopathy (DR) Evolutionary Undersampling (EUS) geometric mean (GM) gated recurrent unit (GRU) k-nearest neighbor (KNN) Matthews correlation coefficient (MCC) Markov decision process (MDP) Principle Component Analysis (PCA) random majority undersampling (RUS)

References Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Batista, G. E.; Prati, R. C.; and Monard, M. C. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter 6(1):20– 29. Błaszczy´nski, J., and Stefanowski, J. 2015. Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150:529–542. Calders, T., and Jaroszewicz, S. 2007. Efficient auc optimization for classification. In European Conference on Principles of Data Mining and Knowledge Discovery, 42–53. Springer. Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; and Kegelmeyer, W. P. 2002. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research 16:321– 357. Cieslak, D. A., and Chawla, N. V. 2008. Start globally, optimize locally, predict globally: Improving performance on imbalanced data. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, 143–152. IEEE. Dal Pozzolo, A.; Caelen, O.; Johnson, R. A.; and Bontempi, G. 2015. Calibrating probability with undersampling for unbalanced classification. In Computational Intelligence, 2015 IEEE Symposium Series on, 159–166. IEEE. Eban, E.; Schain, M.; Mackey, A.; Gordon, A.; Rifkin, R.; and Elidan, G. 2017. Scalable learning of non-decomposable objectives. In Artificial Intelligence and Statistics, 832–840. Fauw, D. 2015. J. 5th place solution of the kaggle diabetic retinopathy competition. Fern´andez, A.; Garc´ıa, S.; del Jesus, M. J.; and Herrera, F. 2008. A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced datasets. Fuzzy Sets and Systems 159(18):2378–2398. Ganganwar, V. 2012. An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering 2(4):42–47. Garc´ıa, S., and Herrera, F. 2009. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evolutionary computation 17(3):275–306. Hanley, J. A., and McNeil, B. J. 1982. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36. He, H., and Garcia, E. A. 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21(9):1263–1284. He, H.; Bai, Y.; Garcia, E. A.; and Li, S. 2008. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, 1322–1328. IEEE. Herschtal, A., and Raskutti, B. 2004. Optimising area under the roc curve using gradient descent. In Proceedings of the twentyfirst international conference on Machine learning, 49. ACM. Kang, P., and Cho, S. 2006. Eus svms: Ensemble of under-sampled svms for data imbalance problems. In International Conference on Neural Information Processing, 837–846. Springer. Leibig, C.; Allken, V.; Ayhan, M. S.; Berens, P.; and Wahl, S. 2017. Leveraging uncertainty information from deep neural networks for disease detection. Scientific reports 7(1):17816. Lemaˆıtre, G.; Nogueira, F.; and Aridas, C. K. 2017. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research 18(17):1–5. Liu, X.-Y.; Wu, J.; and Zhou, Z.-H. 2009. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539–550. Mani, I., and Zhang, I. 2003. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, volume 126. Matthews, B. W. 1975. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405(2):442–451. Norouzi, M.; Bengio, S.; Jaitly, N.; Schuster, M.; Wu, Y.; Schuurmans, D.; et al. 2016. Reward augmented maximum likelihood for neural structured prediction. In Advances In Neural Information Processing Systems, 1723–1731. Parambath, S. P.; Usunier, N.; and Grandvalet, Y. 2014. Optimizing f-measures by cost-sensitive classification. In Advances in Neural Information Processing Systems, 2123– 2131. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830. Ranzato, M.; Chopra, S.; Auli, M.; and Zaremba, W. 2015. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732. Tieleman, T., and Hinton, G. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4(2):26–31. Tomek, I. 1976a. An experiment with the edited nearestneighbor rule. IEEE Transactions on systems, Man, and Cybernetics (6):448–452. Tomek, I. 1976b. Two modifications of cnn. IEEE Trans. Systems, Man and Cybernetics 6:769–772. Van Hulse, J.; Khoshgoftaar, T. M.; and Napolitano, A. 2007. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th international conference on Machine learning, 935–942. ACM. Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Reinforcement Learning. Springer. 5–32. Wilson, D. L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics (3):408–421. Wu, Y.; Schuster, M.; Chen, Z.; Le, Q. V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.

#48 Transfer of Machine Learning Fairness across Domains

false positive rates (FPR) false negative rate (FNR)

References [1] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. M.Wallach. A reductions approach to fair classification. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 60–69, 2018. [2] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137–144, 2007. [3] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. Machine learning, 79(1-2):151–175, 2010. [4] A. Beutel, J. Chen, Z. Zhao, and E. H. Chi. Data decisions and theoretical implications when adversarially learning fair representations. Proceedings of the Conference on Fairness, Accountability and Transparency, 2017. [5] A. Beutel, J. Chen, T. Doshi, H. Qian, A. Woodruff, C. Luu, P. Kreitmann, J. Bischof, and E. H. Chi. Putting fairness principles into practice: Challenges, metrics, and improvements. Artificial Intelligence, Ethics, and Society, 2019. [6] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 343–351, 2016. [7] I. Chen, F. D. Johansson, and D. Sontag. Why is my classifier discriminatory? arXiv preprint arXiv:1805.12002, 2018. [8] J. Chen, N. Kallus, X. Mao, G. Svacha, and M. Udell. Fairness under unawareness: Assessing disparity when protected class is unobserved. In FAT*, pages 339–348. ACM, 2019. [9] A. Coston, K. N. Ramamurthy, D. Wei, K. R. Varshney, S. Speakman, Z. Mustahsan, and S. Chakraborty. Fair transfer learning with missing protected attributes. In Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, Honolulu, HI, USA, 2019. [10] K. Crammer, M. Kearns, and J. Wortman. Learning from multiple sources. Journal of Machine Learning Research, 9(Aug):1757–1774, 2008. [11] L. Dixon, J. Li, J. Sorensen, N. Thain, and L. Vasserman. Measuring and mitigating unintended bias in text classification. In available at: www. aies-conference. com/wpcontent/ papers/main/AIES_2018_paper_9. pdf (accessed 6 August 2018).[Google Scholar], 2018. [12] H. Edwards and A. J. Storkey. Censoring representations with an adversary. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.

[13] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016. [14] G. Goh, A. Cotter, M. Gupta, and M. P. Friedlander. Satisfying real-world goals with dataset constraints. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2415–2423. Curran Associates, Inc., 2016. [15] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. In The Journal of Machine Learning Research, 2012. [16] M. R. Gupta, A. Cotter, M. M. Fard, and S. Wang. Proxy fairness. CoRR, abs/1806.11212, 2018. URL http://arxiv.org/abs/1806.11212. [17] M. Hardt, E. Price, N. Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016. [18] N. Kallus and A. Zhou. Residual unfairness in fair machine learning from prejudiced data. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 2444–2453, 2018. [19] N. Kallus and A. Zhou. Assessing disparate impacts of personalized interventions: Identifiability and bounds. arXiv preprint arXiv:1906.01552, 2019. [20] C. Lan and J. Huan. Discriminatory transfer. CoRR, 2017. URL http://arxiv.org/abs/ 1707.00780. [21] Y. Li, T. Baldwin, and T. Cohn. Towards robust and privacy-preserving text representations. arXiv preprint arXiv:1805.06093, 2018. [22] M. Long, Y. Cao, J. Wang, and M. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015. [23] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. S. Zemel. The variational fair autoencoder. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. [24] D. Madras, E. Creager, T. Pitassi, and R. Zemel. Learning adversarially fair and transferable representations. arXiv preprint arXiv:1802.06309, 2018. [25] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algorithms. COLT, 2009. [26] S. J. Pan, Q. Yang, et al. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010. [27] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison. Hidden technical debt in machine learning systems. In Advances in neural information processing systems, pages 2503–2511, 2015. [28] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. There is no free lunch in adversarial robustness (but there are unexpected benefits). arXiv preprint arXiv:1805.12152, 2018. [29] K. Weiss, T. M. Khoshgoftaar, and D. Wang. A survey of transfer learning. Journal of Big Data, 2016. [30] M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classification. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, pages 962–970, 2017. [31] B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial learning. CoRR, abs/1801.07593, 2018. URL http://arxiv.org/abs/1801.07593.

#49 Using Image Fairness Representations in Diversity-Based Re-ranking for Recommendations

Maximal Marginal Relevance (MMR) Fairness Maximal Marginal Relevance (FMMR)

REFERENCES [1] [n. d.]. Burst. h�ps://burst.shopify.com/. Accessed: 2018-04-18. [2] Abolfazl Asudehy, HV Jagadishy, Julia Stoyanovichz, and Gautam Das. 2017. Designing Fair Ranking Schemes. arXiv preprint arXiv:1712.09752 (2017). [3] Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classi�cation. In Conference on Fairness, Accountability and Transparency. 77–91. [4] Robin Burke, Nasim Sonboli, Masoud Mansoury, and Aldo Ordo ˜ nez-Gauger. 2017. Balanced Neighborhoods for Fairness-aware Collaborative Recommendation. (2017). [5] Jaime Carbonell and Jade Goldstein. 1998. �e use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 335–336. [6] L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. 2017. Ranking with fairness constraints. arXiv preprint arXiv:1704.06840 (2017). [7] Yoav Goldberg and Omer Levy. 2014. word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014). [8] Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Je� Donahue, and Sarah Tavel. 2015. Visual search at pinterest. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1889–1898. [9] Je� Johnson, Ma�hijs Douze, and Herv´e J´egou. 2017. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017). [10] Honglak Lee. 2010. Unsupervised feature learning via sparse hierarchical representations. Stanford University. [11] Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n recommender systems. In Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 497–506. [12] Zhou Ren, Hailin Jin, Zhe Lin, Chen Fang, and Alan Yuille. 2016. Joint image-text representation by gaussian visual-semantic embedding. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 207–211. [13] Sco� Sanner, Shengbo Guo, �ore Graepel, Sadegh Kharazmi, and Sarvnaz Karimi. 2011. Diverse retrieval via greedy optimization of expected 1-call@ k in a latent subtopic relevance model. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1977–1980. [14] Devashish Shankar, Sujay Narumanchi, HA Ananya, Pramod Kompalli, and Krishnendu Chaudhury. 2017. Deep learning based large scale visual recommendation and search for E-Commerce. arXiv preprint arXiv:1703.02344 (2017). [15] Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). [16] Ashudeep Singh and �orsten Joachims. 2018. Fairness of Exposure in Rankings. arXiv preprint arXiv:1802.07281 (2018). [17] Christian Szegedy, Vincent Vanhoucke, Sergey Io�e, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pa�ern Recognition. 2818–2826. [18] Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, QiaosongWang, Hadi Kiapour, and Robinson Piramuthu. 2017. Visual search at ebay. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2101–2110. [19] Ke Yang and Julia Stoyanovich. 2016. Measuring fairness in ranked outputs. arXiv preprint arXiv:1610.08559 (2016). [20] Jun Yu, Sunil Mohan, Duangmanee Pew Pu�hividhya, and Weng-Keen Wong. 2014. Latent dirichlet allocation based diversi�ed retrieval for e-commerce search. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 463–472. [21] Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. Fa* ir: A fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1569–1578. [22] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325–333.

#50 Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning

local distributional smoothness (LDS) random perturbation training (RPT) Street View House Numbers (SVHN) virtual adversarial training (VAT) virtual adversarial examples (VAEs)

REFERENCES [1] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016. [2] Mudassar Abbas, Jyri Kivinen, and Tapani Raiko. Understanding regularization by virtual adversarial training, ladder networks and others. In Workshop on ICLR, 2016. [3] Hirotugu Akaike. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pages 199–213. Springer, 1998. [4] Vladimir Igorevich Arnol’d. Mathematical methods of classical mechanics, volume 60. Springer Science & Business Media, 2013. [5] Philip Bachman, Ouais Alsharif, and Doina Precup. Learning with pseudo-ensembles. In NIPS, 2014. [6] Christopher M Bishop. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1):108–116, 1995. [7] Christopher M Bishop. Pattern Recognition and Machine Learning. Springer, 2006. [8] Ronan Collobert, Fabian Sinz, Jason Weston, and L´eon Bottou. Large scale transductive SVMs. Journal of Machine Learning Research, 7(Aug):1687–1712, 2006. [9] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016. [10] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In AISTATS, 2011. [11] Gene H Golub and Henk A van der Vorst. Eigenvalue computation in the 20th century. Journal of Computational and Applied Mathematics, 123(1):35–65, 2000. [12] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. [13] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014. [14] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015. [15] Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In NIPS, 2004. [16] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. In Workshop on ICLR, 2015. [17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In ECCV, 2016. [18] Gao Huang, Zhuang Liu, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, 2017. [19] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015. [20] Kevin Jarrett, Koray Kavukcuoglu, Marc’Aurelio Ranzato, and Yann LeCun. What is the best multi-stage architecture for object recognition? In ICCV, 2009. [21] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015. [22] Diederik Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semi-supervised learning with deep generative models. In NIPS, 2014. [23] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009. [24] Samuli Laine and Timo Aila. Temporal ensembling for semisupervised learning. In ICLR, 2017. [25] Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. Deeply-supervised nets. In AISTATS, 2015. [26] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. In ICLR, 2014. [27] Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. Auxiliary deep generative models. In ICML, 2016. 14 [28] Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectifier nonlinearities improve neural network acoustic models. In ICML, 2013. [29] Shin-ichi Maeda. A Bayesian encourages dropout. arXiv preprint arXiv:1412.7003, 2014. [30] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. Distributional smoothing with virtual adversarial training. In ICLR, 2016. [31] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted Boltzmann machines. In ICML, 2010. [32] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In Workshop on deep learning and unsupervised feature learning on NIPS, 2011. [33] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised learning with ladder networks. In NIPS, 2015. [34] Russell Reed, Seho Oh, and RJ Marks. Regularization using jittered training data. In IJCNN. IEEE, 1992. [35] Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In NIPS, 2016. [36] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. In NIPS, 2016. [37] Jost Tobias Springenberg. Unsupervised and semi-supervised learning with categorical generative adversarial networks. In ICLR, 2015. [38] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. In Workshop on ICLR, 2015. [39] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1), 2014. [40] Rupesh Kumar Srivastava, Klaus Greff, and J ¨ urgen Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015. [41] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014. [42] Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688, 2016. [43] Andrej N Tikhonov and Vasiliy Y Arsenin. Solutions of ill-posed problems. Winston, 1977. [44] Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. Chainer: a next-generation open source framework for deep learning. In Workshop on machine learning systems (LearningSys) on NIPS, 2015. [45] Stefan Wager, Sida Wang, and Percy S Liang. Dropout training as adaptive regularization. In NIPS, 2013. [46] Grace Wahba. Spline models for observational data. Siam, 1990. [47] Sumio Watanabe. Algebraic geometry and statistical learning theory. Cambridge University Press, 2009. [48] Junbo Zhao, Michael Mathieu, Ross Goroshin, and Yann Lecun. Stacked what-where auto-encoders. In Workshop on ICLR, 2016. [49] Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Citeseer, 2002.