Awesome Trustworthy Deep Learning

The deployment of deep learning in real-world systems calls for a set of complementary technologies that will ensure that deep learning is trustworthy (Nicolas Papernot). The list covers different topics in emerging research areas including but not limited to out-of-distribution generalization, adversarial examples, backdoor attack, model inversion attack, machine unlearning, etc.

Daily updating from ArXiv. The preview README only includes papers submitted to ArXiv within the last one year. More paper can be found here 📂 [Full List].

Paper List

Survey

A Comprehensive Survey on Trustworthy Recommender Systems. [paper]
- Wenqi Fan, Xiangyu Zhao, Xiao Chen, Jingran Su, Jingtong Gao, Lin Wang, Qidong Liu, Yiqi Wang, Han Xu, Lei Chen, Qing Li.
- Key Word: Recommender Systems; Survey.
- Digest
  We provide a comprehensive overview of Trustworthy Recommender systems (TRec) with a specific focus on six of the most important aspects; namely, Safety & Robustness, Nondiscrimination & Fairness, Explainability, Privacy, Environmental Well-being, and Accountability & Auditability. For each aspect, we summarize the recent related technologies and discuss potential research directions to help achieve trustworthy recommender systems in the future.
Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey. [paper]
- Xiaodan Xing, Huanjun Wu, Lichao Wang, Iain Stenson, May Yong, Javier Del Ser, Simon Walsh, Guang Yang.
- Key Word: Non-Imaging Medical Data Generation; Healthcare; Survey.
- Digest
  State-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Thus, in this paper, we will review the synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-styled review paper will provide comprehensive descriptions of non-imaging medical data synthesis on aspects including algorithms, evaluations, limitations and future research directions.
Trustworthy Recommender Systems. [paper]
- Shoujin Wang, Xiuzhen Zhang, Yan Wang, Huan Liu, Francesco Ricci.
- Key Word: Survey; Recommender Systems; Trustworthy Recommendation.
- Digest
  Recent years have witnessed an increasing number of threats to RSs, coming from attacks, system and user generated noise, system bias. As a result, it has become clear that a strict focus on RS accuracy is limited and the research must consider other important factors, e.g., trustworthiness. For end users, a trustworthy RS (TRS) should not only be accurate, but also transparent, unbiased and fair as well as robust to noise or attacks. These observations actually led to a paradigm shift of the research on RSs: from accuracy-oriented RSs to TRSs. However, researchers lack a systematic overview and discussion of the literature in this novel and fast developing field of TRSs. To this end, in this paper, we provide an overview of TRSs, including a discussion of the motivation and basic concepts of TRSs, a presentation of the challenges in building TRSs, and a perspective on the future directions in this area.
Trustworthy Graph Neural Networks: Aspects, Methods and Trends. [paper]
- He Zhang, Bang Wu, Xingliang Yuan, Shirui Pan, Hanghang Tong, Jian Pei.
- Key Word: Survey; Graph Neural Networks.
- Digest
  We propose a comprehensive roadmap to build trustworthy GNNs from the view of the various computing technologies involved. In this survey, we introduce basic concepts and comprehensively summarise existing efforts for trustworthy GNNs from six aspects, including robustness, explainability, privacy, fairness, accountability, and environmental well-being. Additionally, we highlight the intricate cross-aspect relations between the above six aspects of trustworthy GNNs. Finally, we present a thorough overview of trending directions for facilitating the research and industrialisation of trustworthy GNNs.
A Survey on AI Sustainability: Emerging Trends on Learning Algorithms and Research Challenges. [paper]
- Zhenghua Chen, Min Wu, Alvin Chan, Xiaoli Li, Yew-Soon Ong.
- Key Word: Survey; Sustainability.
- Digest
  The technical trend in realizing the successes has been towards increasing complex and large size AI models so as to solve more complex problems at superior performance and robustness. This rapid progress, however, has taken place at the expense of substantial environmental costs and resources. Besides, debates on the societal impacts of AI, such as fairness, safety and privacy, have continued to grow in intensity. These issues have presented major concerns pertaining to the sustainable development of AI. In this work, we review major trends in machine learning approaches that can address the sustainability problem of AI.

Out-of-Distribution Generalization

Generalizing to unseen domains: a survey on domain generalization. [paper]
- Jindong Wang , Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip S. Yu. IEEE TKDE 2022
- Key Word: Out-of-distribution generalization; Domain generalization
- Digest
  This paper presents the first review of recent advances in this area. First, we provide a formal definition of domain generalization and discuss several related fields. We then thoroughly review the theories related to domain generalization and carefully analyze the theory behind generalization.
Margin Calibration for Long-Tailed Visual Recognition. [paper]
- Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, and Takahiro Shinozaki. ACML 2022
- Key Word: long-tailed recognition; imbalance learning
- Digest
  We study the relationship between the margins and logits (classification scores) and empirically observe the biased margins and the biased logits are positively correlated. We propose MARC, a simple yet effective MARgin Calibration function to dynamically calibrate the biased margins for unbiased logits. MARC is extremely easy: just three lines of code.
Domain generalization for activity recognition via adaptive feature fusion. [paper]
- Xin Qin, Jindong Wang, Yiqiang Chen, Wang Lu, and Xinlong Jiang. ACM TIST 2022
- Key Word: Domain generalization; Activity recognition.
- Digest
  We propose Adaptive Feature Fusion for Activity Recognition (AFFAR), a domain generalization approach that learns to fuse the domain-invariant and domain-specific representations to improve the model’s generalization performance. AFFAR takes the best of both worlds where domain-invariant representations enhance the transferability across domains and domain-specific representations leverage the model discrimination power from each domain.
Memory-Guided Multi-View Multi-Domain Fake News Detection. [paper]
- Yongchun Zhu, Qiang Sheng, Juan Cao, Qiong Nan, Kai Shu, Minghui Wu, Jindong Wang, and Fuzhen Zhuang. IEEE TKDE 2022
- Key Word: Multi-domain learning; out-of-distribution generalization.
- Digest
  We propose a Memory-guided Multi-view Multi-domain Fake News Detection Framework (M3FEND) to address these two challenges. We model news pieces from a multi-view perspective, including semantics, emotion, and style. Specifically, we propose a Domain Memory Bank to enrich domain information which could discover potential domain labels based on seen news pieces and model domain characteristics. Then, with enriched domain information as input, a Domain Adapter could adaptively aggregate discriminative information from multiple views for news in various domains.
Decoupled Federated Learning for ASR with Non-IID Data. [paper]
- Han Zhu, Jindong Wang , Gaofeng Cheng, Pengyuan Zhang, and Yonghong Yan. Interspeech 2022
- Key Word: Non-IID; Federated learning; Speech recognition.
- Digest
  We tackle the non-IID issue in FL-based ASR with personalized FL, which learns personalized models for each client. Concretely, we propose two types of personalized FL approaches for ASR. Firstly, we adapt the personalization layer based FL for ASR, which keeps some layers locally to learn personalization models. Secondly, to reduce the communication and computation costs, we propose decoupled federated learning (DecoupleFL).
Semantic-Discriminative Mixup for Generalizable Sensor-based Cross-domain Activity Recognition. [paper]
- Wang Lu, Jindong Wang, Yiqiang Chen, Sinno Pan, Chunyu Hu, and Xin Qin. ACM UbiComp 2022
- Key Word: Domain generalization; Activity recognition.
- Digest
  We propose a novel method called Semantic-Discriminative Mixup (SDMix) for generalizable cross-domain HAR. Firstly, we introduce semantic-aware Mixup that considers the activity semantic ranges to overcome the semantic inconsistency brought by domain differences. Secondly, we introduce the large margin loss to enhance the discrimination of Mixup to prevent misclassification brought by noisy virtual labels.
Exploiting Adapters for Cross-lingual Low-resource Speech Recognition. [paper]
- Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, and Takahiro Shinozaki. TASLP 2022
- Key Word: Cross-domain learning; Speech recognition.
- Digest
  We propose a novel algorithms called SimAdapter for explicitly learning knowledge from adapters. Our algorithm leverages adapters which can be easily integrated into the Transformer structure.MetaAdapter leverages meta-learning to transfer the general knowledge from training data to the test language. SimAdapter aims to learn the similarities between the source and target languages during fine-tuning using the adapters.
Learning causal semantic representation for out-of-distribution prediction. [paper]
- Chang Liu, Xinwei Sun, Jindong Wang , Haoyue Tang, Tao Li, Tao Qin, Wei Chen, and Tie-Yan Liu. NeurIPS 2021
- Key Word: Out-of-distribution generalization; Causality
- Digest
  We propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately, and develop methods for OOD prediction from a single training domain, which is common and challenging. The methods are based on the causal invariance principle, with a novel design in variational Bayes for both efficient learning and easy prediction.
Adarnn: Adaptive learning and forecasting of time series. [paper]
- Yuntao Du, Jindong Wang, Wenjie Feng, Sinno Pan, Tao Qin, Renjun Xu, and Chongjun Wang. CIKM 2021
- Key Word: Out-of-distribution prediction; Time series analysis
- Digest
  This paper proposes Adaptive RNNs (AdaRNN) to tackle the TCS problem by building an adaptive model that generalizes well on the unseen test data. AdaRNN is sequentially composed of two novel algorithms. First, we propose Temporal Distribution Characterization to better characterize the distribution information in the TS. Second, we propose Temporal Distribution Matching to reduce the distribution mismatch in TS to learn the adaptive TS model. AdaRNN is a general framework with flexible distribution distances integrated.
Stable learning establishes some common ground between causal inference and machine learning. [paper]
- Peng Cui, Susan Athey. Nature Machine Intelligence
- Key Word: Stable Learning; Causal Inference.
- Digest
  With the aim of bridging the gap between the tradition of precise modelling in causal inference and black-box approaches from machine learning, stable learning is proposed and developed as a source of common ground. This Perspective clarifies a source of risk for machine learning models and discusses the benefits of bringing causality into learning.
CrossMatch: Cross-Classifier Consistency Regularization for Open-Set Single Domain Generalization. [paper]
- Ronghang Zhu, Sheng Li. ICLR 2022
- Key Word: Single Domain Generalization, Open-Set Recognition.
- Digest
  We propose a challenging and untouched problem: Open-Set Single Domain Generalization (OS-SDG), where target domains include unseen categories out of source label space. The goal of OS-SDG is to learn a model, with only one source domain, to classify a target sample with correct class if it belongs to source label space, or assign it to unknown classes. We design a CrossMatch approach to improve the performance of SDG methods on identifying unknown classes by leveraging a multi-binary classifier.
Invariant Causal Representation Learning for Out-of-Distribution Generalization. [paper]
- Chaochao Lu, Yuhuai Wu, José Miguel Hernández-Lobato, Bernhard Schölkopf. ICLR 2022
- Key Word: Out-of-Distribution Generalization; Invariant Causal Prediction; Causal Representation Learning.
- Digest
  We propose invariant Causal Representation Learning (iCaRL), an approach that enables out-of-distribution (OOD) generalization in the nonlinear setting (i.e., nonlinear representations and nonlinear classifiers). It builds upon a practical and general assumption: the prior over the data representation (i.e., a set of latent variables encoding the data) given the target and the environment belongs to general exponential family distributions, i.e., a more flexible conditionally non-factorized prior that can actually capture complicated dependences between the latent variables.
- ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations. [paper]
- Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim.
- Key Word: Out-of-Distribution Generalization; Dataset.
- Digest
  Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fail to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X, a set of sixteen human annotations of factors such as pose, background, or lighting the entire ImageNet-1k validation set as well as a random subset of 12k training images.
Functional Indirection Neural Estimator for Better Out-of-distribution Generalization. [paper]
- Kha Pham, Hung Le, Man Ngo, Truyen Tran. NeurIPS 2022
- Key Word: Out-of-Distribution Generalization; Functional Indirection Neural Estimator.
- Digest
  We hypothesize that OOD generalization may be achieved by performing analogy-making and indirection in the functional space instead of the data space as in current methods. To realize this, we design FINE (Functional Indirection Neural Estimator), a neural framework that learns to compose functions that map data input to output on-the-fly. FINE consists of a backbone network and a trainable semantic memory of basis weight matrices.
Just Mix Once: Worst-group Generalization by Group Interpolation. [paper]
- Giorgio Giannone, Serhii Havrylov, Jordan Massiah, Emine Yilmaz, Yunlong Jiao.
- Key Word: Out-of-Distribution Generalization; Data Augmentation; Mixup; Group Robustness.
- Digest
  A recent line of work leverages self-supervision and oversampling to improve generalization on minority groups without group annotation. We propose to unify and generalize these approaches using a class-conditional variant of mixup tailored for worst-group generalization. Our approach, Just Mix Once (JM1), interpolates samples during learning, augmenting the training distribution with a continuous mixture of groups. JM1 is domain agnostic and computationally efficient, can be used with any level of group annotation, and performs on par or better than the state-of-the-art on worst-group generalization.
On Feature Learning in the Presence of Spurious Correlations. [paper] [code]
- Pavel Izmailov, Polina Kirichenko, Nate Gruver, Andrew Gordon Wilson. NeurIPS 2022
- Key Word: Spurious Correlations; Feature Learning.
- Digest
  We evaluate the amount of information about the core (non-spurious) features that can be decoded from the representations learned by standard empirical risk minimization (ERM) and specialized group robustness training. Following recent work on Deep Feature Reweighting (DFR), we evaluate the feature representations by re-training the last layer of the model on a held-out set where the spurious correlation is broken.
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise. [paper]
- Haotian Ye, James Zou, Linjun Zhang.
- Key Word: Feature Learning; Spurious Correlations.
- Digest
  We find that core features are only learned well when they are less noisy than spurious features, which is not necessarily true in practice. We provide both theories and experiments to support this finding and to illustrate the importance of feature noise. Moreover, we propose an algorithm called Freeze then Train (FTT), that first freezes certain salient features and then trains the rest of the features using ERM.
ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization. [paper]
- Qishi Dong, Awais Muhammad, Fengwei Zhou, Chuanlong Xie, Tianyang Hu, Yongxin Yang, Sung-Ho Bae, Zhenguo Li. NeurIPS 2022
- Key Word: Out-of-Distribution Generalization; Pre-Training.
- Digest
  We propose ZooD, a paradigm for PTMs ranking and ensemble with feature selection. Our proposed metric ranks PTMs by quantifying inter-class discriminability and inter-domain stability of the features extracted by the PTMs in a leave-one-domain-out cross-validation manner. The top-K ranked models are then aggregated for the target OoD task. To avoid accumulating noise induced by model ensemble, we propose an efficient variational EM algorithm to select informative features. We evaluate our paradigm on a diverse model zoo consisting of 35 models for various OoD tasks.
Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors. [paper] [code]
- Qixun Wang, Yifei Wang, Hong Zhu, Yisen Wang. NeurIPS 2022
- Key Word: Out-of-Distribution Generalization; Adversarial Training.
- Digest
  We empirically show that sample-wise AT has limited improvement on OOD performance. Specifically, we find that AT can only maintain performance at smaller scales of perturbation while Universal AT (UAT) is more robust to larger-scale perturbations. This provides us with clues that adversarial perturbations with universal (low dimensional) structures can enhance the robustness against large data distribution shifts that are common in OOD scenarios. Inspired by this, we propose two AT variants with low-rank structures to train OOD-robust models.
Revisiting adapters with adversarial training. [paper]
- Sylvestre-Alvise Rebuffi, Francesco Croce, Sven Gowal.
- Key Word: Adversarial Training; Adapters.
- Digest
  We improve upon the top-1 accuracy of a non-adversarially trained ViT-B16 model by +1.12% on ImageNet (reaching 83.76% top-1 accuracy). Second, and more importantly, we show that training with adapters enables model soups through linear combinations of the clean and adversarial tokens. These model soups, which we call adversarial model soups, allow us to trade-off between clean and robust accuracy without sacrificing efficiency. Finally, we show that we can easily adapt the resulting models in the face of distribution shifts.
FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings. [paper]
- Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko, Santiago Silva, Maria Teleńczuk, Shadi Albarqouni, Salman Avestimehr, Aurélien Bellet, Aymeric Dieuleveut, Martin Jaggi, Sai Praneeth Karimireddy, Marco Lorenzi, Giovanni Neglia, Marc Tommasi, Mathieu Andreux. NeurIPS 2022
- Key Word: Federated Learning; Healthcare; Benchmarks.
- Digest
  We propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code.
Coresets for Wasserstein Distributionally Robust Optimization Problems. [paper]
- Ruomin Huang, Jiawei Huang, Wenjie Liu, Hu Ding. NeurIPS 2022
- Key Word: Coresets; Distributionally Robust Optimization.
- Digest
  We introduce a unified framework to construct the ϵ-coreset for the general WDRO problems. Though it is challenging to obtain a conventional coreset for WDRO due to the uncertainty issue of ambiguous data, we show that we can compute a ''dual coreset'' by using the strong duality property of WDRO.
Attention Diversification for Domain Generalization. [paper] [code]
- Rang Meng, Xianfeng Li, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, Shiliang Pu. ECCV 2022
- Key Word: Domain Generalization; Attention Diversification.
- Digest
  We find the devils lie in the fact that models trained on different domains merely bias to different domain-specific features yet overlook diverse task-related features. Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features.
Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints. [paper]
- Jiajin Li, Sirui Lin, Jose Blanchet, Viet Anh Nguyen. NeurIPS 2022
- Key Word: Optimal Transport; Distributionally Robust Optimization; Tikhonov Regularization.
- Digest
  Distributionally robust optimization has been shown to offer a principled way to regularize learning models. In this paper, we find that Tikhonov regularization is distributionally robust in an optimal transport sense (i.e., if an adversary chooses distributions in a suitable optimal transport neighborhood of the empirical measure), provided that suitable martingale constraints are also imposed. Further, we introduce a relaxation of the martingale constraints which not only provides a unified viewpoint to a class of existing robust methods but also leads to new regularization tools.
Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation. [paper]
- Aahlad Puli, Nitish Joshi, He He, Rajesh Ranganath.
- Key Word: Data Augmentation; Robustness to Spurious Correlations.
- Digest
  We develop an alternative way to produce robust models by data augmentation. These data augmentations corrupt semantic information to produce models that identify and adjust for where nuisances drive predictions. We study semantic corruptions in powering different robust-modeling methods for multiple out-of distribution (OOD) tasks like classifying waterbirds, natural language inference, and detecting Cardiomegaly in chest X-rays.
Federated Representation Learning via Maximal Coding Rate Reduction. [paper]
- Juan Cervino, Navid NaderiAlizadeh, Alejandro Ribeiro.
- Key Word: Personalized Federated Learning; Maximal Coding Rate Reduction.
- Digest
  We propose a federated methodology to learn low-dimensional representations from a dataset that is distributed among several clients. In particular, we move away from the commonly-used cross-entropy loss in federated learning, and seek to learn shared low-dimensional representations of the data in a decentralized manner via the principle of maximal coding rate reduction (MCR2). Our proposed method, which we refer to as FLOW, utilizes MCR2 as the objective of choice, hence resulting in representations that are both between-class discriminative and within-class compressible.
MaskTune: Mitigating Spurious Correlations by Forcing to Explore. [paper] [code]
- Saeid Asgari Taghanaki, Aliasghar Khani, Fereshte Khani, Ali Gholami, Linh Tran, Ali Mahdavi-Amiri, Ghassan Hamarneh. NeurIPS 2022
- Key Word: Input Masking; Robustness to Spurious Correlations.
- Digest
  A fundamental challenge of over-parameterized deep learning models is learning meaningful data representations that yield good performance on a downstream task without over-fitting spurious input features. This work proposes MaskTune, a masking strategy that prevents over-reliance on spurious (or a limited number of) features. MaskTune forces the trained model to explore new features during a single epoch finetuning by masking previously discovered features. MaskTune, unlike earlier approaches for mitigating shortcut learning, does not require any supervision, such as annotating spurious features or labels for subgroup samples in a dataset.
Domain Generalization -- A Causal Perspective. [paper]
- Paras Sheth, Raha Moraffah, K. Selçuk Candan, Adrienne Raglin, Huan Liu.
- Key Word: Domain Generalization; Causality; Survey.
- Digest
  We present a comprehensive survey on causal domain generalization models from the aspects of the problem and causal theories. Furthermore, this survey includes in-depth insights into publicly accessible datasets and benchmarks for domain generalization in various domains. Finally, we conclude the survey with insights and discussions on future research directions. Finally, we conclude the survey with insights and discussions on future research directions.
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases. [paper]
- James Harrison, Luke Metz, Jascha Sohl-Dickstein. NeurIPS 2022
- Key Word: Optimizer; Out-of-Distribution Generalization.
- Digest
  We use tools from dynamical systems to investigate the inductive biases and stability properties of optimization algorithms, and apply the resulting insights to designing inductive biases for blackbox optimizers. Our investigation begins with a noisy quadratic model, where we characterize conditions in which optimization is stable, in terms of eigenvalues of the training dynamics. We then introduce simple modifications to a learned optimizer's architecture and meta-training procedure which lead to improved stability, and improve the optimizer's inductive bias.
UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup. [paper]
- Zongbo Han, Zhipeng Liang, Fan Yang, Liu Liu, Lanqing Li, Yatao Bian, Peilin Zhao, Bingzhe Wu, Changqing Zhang, Jianhua Yao. NeurIPS 2022
- Key Word: Importance Weighting; Subpopulation Shift; Mixup.
- Digest
  We propose a simple yet practical framework, called uncertainty-aware mixup (Umix), to mitigate the overfitting issue in over-parameterized models by reweighting the "mixed" samples according to the sample uncertainty. The training-trajectories-based uncertainty estimation is equipped in the proposed Umix for each sample to flexibly characterize the subpopulation distribution. We also provide insightful theoretical analysis to verify that Umix achieves better generalization bounds over prior works.
Importance Tempering: Group Robustness for Overparameterized Models. [paper]
- Yiping Lu, Wenlong Ji, Zachary Izzo, Lexing Ying.
- Ke Word: Importance Tempering; Label Shift; Neural Collapse; Spurious Correlations.
- Digest
  We propose importance tempering to improve the decision boundary and achieve consistently better results for overparameterized models. Theoretically, we justify that the selection of group temperature can be different under label shift and spurious correlation setting. At the same time, we also prove that properly selected temperatures can extricate the minority collapse for imbalanced classification.
On-Device Domain Generalization. [paper] [code]
- Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu.
- Key Word: Domain Generalization; Knowledge Distillation.
- Digest
  We find that knowledge distillation is a strong candidate for solving the problem: it outperforms state-of-the-art DG methods that were developed using large models with a large margin. Moreover, we observe that the teacher-student performance gap on test data with domain shift is bigger than that on in-distribution data. To improve DG for tiny neural networks without increasing the deployment cost, we propose a simple idea called out-of-distribution knowledge distillation (OKD), which aims to teach the student how the teacher handles (synthetic) out-of-distribution data and is proved to be a promising framework for solving the problem.
ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets. [paper]
- Damien Teney, Seong Joon Oh, Ehsan Abbasnejad.
- Key Word: Out-of-Distribution Generalization.
- Digest
  This short paper shows that inverse correlations between ID and OOD performance do happen in real-world benchmarks. They may have been missed in past studies because of a biased selection of models. We show an example of the pattern on the WILDS-Camelyon17 dataset, using models from multiple training epochs and random seeds. Our observations are particularly striking on models trained with a regularizer that diversifies the solutions to the ERM objective.
DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs. [paper]
- Chaouki Ben Issaid, Anis Elgabli, Mehdi Bennis. TMLR
- Key Word: Robust Federated Learning; Fairness in Federated Learning; Decentralized Learning.
- Digest
  We propose to solve a regularized distributionally robust learning problem in the decentralized setting, taking into account the data distribution shift. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust minimization problem and solved efficiently. Leveraging the newly formulated optimization problem, we propose a robust version of Decentralized Stochastic Gradient Descent (DSGD), coined Distributionally Robust Decentralized Stochastic Gradient Descent (DR-DSGD).
Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey. [paper]
- Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, Xia Hu.
- Key Word: Survey; Shortcut Learning; Out-of-Distribution Generalization; Large Language Models.
- Digest
  Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly hurt their Out-of-Distribution (OOD) generalization and adversarial robustness. In this paper, we provide a review of recent developments that address the robustness challenge of LLMs. We first introduce the concepts and robustness challenge of LLMs. We then introduce methods to identify shortcut learning behavior in LLMs, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we identify key challenges and introduce the connections of this line of research to other directions.
A Unified Causal View of Domain Invariant Representation Learning. [paper] [code]
- Zihao Wang, Victor Veitch.
- Key Word: Causality; Data Augmentation; Invariant Learning.
- Digest
  Machine learning methods can be unreliable when deployed in domains that differ from the domains on which they were trained. To address this, we may wish to learn representations of data that are domain-invariant in the sense that we preserve data structure that is stable across domains, but throw out spuriously-varying parts. There are many representation-learning approaches of this type, including methods based on data augmentation, distributional invariances, and risk invariance. Unfortunately, when faced with any particular real-world domain shift, it is unclear which, if any, of these methods might be expected to work. The purpose of this paper is to show how the different methods relate to each other, and clarify the real-world circumstances under which each is expected to succeed. The key tool is a new notion of domain shift relying on the idea that causal relationships are invariant, but non-causal relationships (e.g., due to confounding) may vary.
Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization. [paper] [code]
- Jiaxin Qi, Kaihua Tang, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang. ECCV 2022
- Key Word: Invarinat Learning; Out-of-Distribution Generalization.
- Digest
  We argue that the widely adopted assumption in prior work, the context bias can be directly annotated or estimated from biased class prediction, renders the context incomplete or even incorrect. In contrast, we point out the everoverlooked other side of the above principle: context is also invariant to class, which motivates us to consider the classes (which are already labeled) as the varying environments to resolve context bias (without context labels). We implement this idea by minimizing the contrastive loss of intra-class sample similarity while assuring this similarity to be invariant across all classes.
Self-Distilled Vision Transformer for Domain Generalization. [paper] [code]
- Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan. ECCV 2022
- Key Word: Domain Generalization; Vision Transformers; Self Distillation.
- Digest
  We attempt to explore ViTs towards addressing the DG problem. Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains. Inspired by the modular architecture of ViTs, we propose a simple DG approach for ViTs, coined as self-distillation for ViTs. It reduces the overfitting to source domains by easing the learning of input-output mapping problem through curating non-zero entropy supervisory signals for intermediate transformer blocks.
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data. [paper] [code]
- Tan Wang, Qianru Sun, Sugiri Pranata, Karlekar Jayashree, Hanwang Zhang. ECCV 2022
- Key Word: Visual Inductive Bias; Data-Efficient Learning; Out-of-Distribution Generalization.
- Digest
  First, compared to sufficient data, we show why insufficient data renders the model more easily biased to the limited training environments that are usually different from testing. For example, if all the training swan samples are "white", the model may wrongly use the "white" environment to represent the intrinsic class swan. Then, we justify that equivariance inductive bias can retain the class feature while invariance inductive bias can remove the environmental feature, leaving the class feature that generalizes to any environmental changes in testing. To impose them on learning, for equivariance, we demonstrate that any off-the-shelf contrastive-based self-supervised feature learning method can be deployed; for invariance, we propose a class-wise invariant risk minimization (IRM) that efficiently tackles the challenge of missing environmental annotation in conventional IRM.
Domain-invariant Feature Exploration for Domain Generalization. [paper] [code]
- Wang Lu, Jindong Wang, Haoliang Li, Yiqiang Chen, Xing Xie.
- Key Word: Domain Generalization; Fourier Features.
- Digest
  We argue that domain-invariant features should be originating from both internal and mutual sides. Internal invariance means that the features can be learned with a single domain and the features capture intrinsic semantics of data, i.e., the property within a domain, which is agnostic to other domains. Mutual invariance means that the features can be learned with multiple domains (cross-domain) and the features contain common information, i.e., the transferable features w.r.t. other domains.
Discrete Key-Value Bottleneck. [paper]
- Frederik Träuble, Anirudh Goyal, Nasim Rahaman, Michael Mozer, Kenji Kawaguchi, Yoshua Bengio, Bernhard Schölkopf.
- Key Word: Distribution Shifts; Catastrophic Forgetting; Memory Augmented Models.
- Digest
  In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable (key, value) codes. In this setup, we follow the encode; process the representation via a discrete bottleneck; and decode paradigm, where the input is fed to the pretrained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a limited number of these (key, value) pairs during inference, enabling localized and context-dependent model updates.
UniFed: A Benchmark for Federated Learning Frameworks. [paper] [code]
- Xiaoyuan Liu, Tianneng Shi, Chulin Xie, Qinbin Li, Kangping Hu, Haoyu Kim, Xiaojun Xu, Bo Li, Dawn Song.
- Key Word: Federated Learning; Benchmark; Privacy.
- Digest
  Federated Learning (FL) has become a practical and popular paradigm in machine learning. However, currently, there is no systematic solution that covers diverse use cases. Practitioners often face the challenge of how to select a matching FL framework for their use case. In this work, we present UniFed, the first unified benchmark for standardized evaluation of the existing open-source FL frameworks. With 15 evaluation scenarios, we present both qualitative and quantitative evaluation results of nine existing popular open-sourced FL frameworks, from the perspectives of functionality, usability, and system performance. We also provide suggestions on framework selection based on the benchmark conclusions and point out future improvement directions.
Grounding Visual Representations with Texts for Domain Generalization. [paper] [code]
- Seonwoo Min, Nokyung Park, Siwon Kim, Seunghyun Park, Jinkyu Kim. ECCV 2022
- Key Word: Domain Generalization; Visual and Textual Explanations.
- Digest
  We introduce two modules to ground visual representations with texts containing typical reasoning of humans: (1) Visual and Textual Joint Embedder and (2) Textual Explanation Generator. The former learns the image-text joint embedding space where we can ground high-level class-discriminative information into the model. The latter leverages an explainable model and generates explanations justifying the rationale behind its decision. To the best of our knowledge, this is the first work to leverage the vision-and-language cross-modality approach for the domain generalization task.
Tackling Long-Tailed Category Distribution Under Domain Shifts. [paper] [code]
- Xiao Gu, Yao Guo, Zeju Li, Jianing Qiu, Qi Dou, Yuxuan Liu, Benny Lo, Guang-Zhong Yang. ECCV 2022
- Key Word: Long-Tailed Category Distribution; Domain Generalization; Cross-Modal Learning.
- Digest
  We took a step forward and looked into the problem of long-tailed classification under domain shifts. We designed three novel core functional blocks including Distribution Calibrated Classification Loss, Visual-Semantic Mapping and Semantic-Similarity Guided Augmentation. Furthermore, we adopted a meta-learning framework which integrates these three blocks to improve domain generalization on unseen target domains.
Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain. [paper] [code]
- Piyapat Saranrittichai, Chaithanya Kumar Mummadi, Claudia Blaiotta, Mauricio Munoz, Volker Fischer. ECCV 2022
- Key Word: Compositional Generalization; Domain Generalization; Learning Independent Representations.
- Digest
  Shortcut learning occurs when a deep neural network overly relies on spurious correlations in the training dataset in order to solve downstream tasks. Prior works have shown how this impairs the compositional generalization capability of deep learning models. To address this problem, we propose a novel approach to mitigate shortcut learning in uncontrolled target domains. Our approach extends the training set with an additional dataset (the source domain), which is specifically designed to facilitate learning independent representations of basic visual factors. We benchmark our idea on synthetic target domains where we explicitly control shortcut opportunities as well as real-world target domains.
Probable Domain Generalization via Quantile Risk Minimization. [paper] [code]
- Cian Eastwood, Alexander Robey, Shashank Singh, Julius von Kügelgen, Hamed Hassani, George J. Pappas, Bernhard Schölkopf.
- Key Word: Domain Generalization; Causality; Invariant Learning.
- Digest
  A recent study found that no DG algorithm outperformed empirical risk minimization in terms of average performance. In this work, we argue that DG is neither a worst-case problem nor an average-case problem, but rather a probabilistic one. To this end, we propose a probabilistic framework for DG, which we call Probable Domain Generalization, wherein our key idea is that distribution shifts seen during training should inform us of probable shifts at test time. To realize this, we explicitly relate training and test domains as draws from the same underlying meta-distribution, and propose a new optimization problem -- Quantile Risk Minimization (QRM) -- which requires that predictors generalize with high probability.
Assaying Out-Of-Distribution Generalization in Transfer Learning. [paper]
- Florian Wenzel, Andrea Dittadi, Peter Vincent Gehler, Carl-Johann Simon-Gabriel, Max Horn, Dominik Zietlow, David Kernert, Chris Russell, Thomas Brox, Bernt Schiele, Bernhard Schölkopf, Francesco Locatello.
- Key Word: Out-of-Distribution Generalization; Transfer Learning; Calibration; Adversarial Robustness; Corruption Robustness; Invariant Learning.
- Digest
  Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting.
On the Strong Correlation Between Model Invariance and Generalization. [paper]
- Weijian Deng, Stephen Gould, Liang Zheng.
- Key Word: Predicting Generalization Gap; Out-of-Distribution Generalization.
- Digest
  First, we introduce effective invariance (EI), a simple and reasonable measure of model invariance which does not rely on image labels. Given predictions on a test image and its transformed version, EI measures how well the predictions agree and with what level of confidence. Second, using invariance scores computed by EI, we perform large-scale quantitative correlation studies between generalization and invariance, focusing on rotation and grayscale transformations. From a model-centric view, we observe generalization and invariance of different models exhibit a strong linear relationship, on both in-distribution and out-of-distribution datasets. From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
Improved OOD Generalization via Conditional Invariant Regularizer. [paper]
- Mingyang Yi, Ruoyu Wang, Jiachen Sun, Zhenguo Li, Zhi-Ming Ma.
- Key Word: Out-of-Distribution Generalization; Conditional Spurious Variation.
- Digest
  Recently, generalization on out-of-distribution (OOD) data with correlation shift has attracted great attention. The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data. For such a problem, we show that given the class label, the conditionally independent models of spurious attributes are OOD generalizable. Based on this, a metric Conditional Spurious Variation (CSV) which controls OOD generalization error, is proposed to measure such conditional independence. To improve the OOD generalization, we regularize the training process with the proposed CSV.
Models Out of Line: A Fourier Lens on Distribution Shift Robustness. [paper]
- Sara Fridovich-Keil, Brian R. Bartoldson, James Diffenderfer, Bhavya Kailkhura, Peer-Timo Bremer.
- Key Word: Predicting Out-of-Distribution Generalization; Frequency Analysis.
- Digest
  There still is no clear understanding of the conditions on OOD data and model properties that are required to observe effective robustness. We approach this issue by conducting a comprehensive empirical study of diverse approaches that are known to impact OOD robustness on a broad range of natural and synthetic distribution shifts of CIFAR-10 and ImageNet. In particular, we view the "effective robustness puzzle" through a Fourier lens and ask how spectral properties of both models and OOD data influence the corresponding effective robustness.
Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning. [paper]
- Damien Teney, Maxime Peyrard, Ehsan Abbasnejad. ECCV 2022
- Key Word: Out-of-Distribution Generalization; Underspecification; Ensembles; Feature Diversity.
- Digest
  We formalize the concept of underspecification and propose a method to identify and partially address it. We train multiple models with an independence constraint that forces them to implement different functions. They discover predictive features that are otherwise ignored by standard empirical risk minimization (ERM), which we then distill into a global model with superior OOD performance. Importantly, we constrain the models to align with the data manifold to ensure that they discover meaningful features. We demonstrate the method on multiple datasets in computer vision (collages, WILDS-Camelyon17, GQA) and discuss general implications of underspecification. Most notably, in-domain performance cannot serve for OOD model selection without additional assumptions.
Neural Networks and the Chomsky Hierarchy. [paper] [code]
- Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Marcus Hutter, Shane Legg, Pedro A. Ortega.
- Key Word: Chomsky Hierarchy; Out-of-Distribution Generalization;
- Digest
  Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (2200 models, 16 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs.
Multi-modal Robustness Analysis Against Language and Visual Perturbations. [paper] [code]
- Madeline C. Schiappa, Yogesh S. Rawat, Shruti Vyas, Vibhav Vineet, Hamid Palangi.
- Key Word: Corruption Robustness; Multi-modal Robustness; Text-to-Video Retrieval.
- Digest
  Joint visual and language modeling on large-scale datasets has recently shown a good progress in multi-modal tasks when compared to single modal learning. However, robustness of these approaches against real-world perturbations has not been studied. In this work, we perform the first extensive robustness study of such models against various real-world perturbations focusing on video and language. We focus on text-to-video retrieval and propose two large-scale benchmark datasets, MSRVTT-P and YouCook2-P, which utilize 90 different visual and 35 different textual perturbations.
Predicting Out-of-Domain Generalization with Local Manifold Smoothness. [paper]
- Nathan Ng, Kyunghyun Cho, Neha Hulkund, Marzyeh Ghassemi.
- Key Word: Measures of Complexity; Predicting Out-of-Distribution Generalization; Measuring Function Smoothness.
- Digest
  Recent work has proposed a variety of complexity measures that directly predict or theoretically bound the generalization capacity of a model. However, these methods rely on a strong set of assumptions that in practice are not always satisfied. Motivated by the limited settings in which existing measures can be applied, we propose a novel complexity measure based on the local manifold smoothness of a classifier. We define local manifold smoothness as a classifier's output sensitivity to perturbations in the manifold neighborhood around a given test point. Intuitively, a classifier that is less sensitive to these perturbations should generalize better.
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology. [paper] [code]
- Yunlong Zhang, Yuxuan Sun, Honglin Li, Sunyi Zheng, Chenglu Zhu, Lin Yang. MICCAI 2022
- Key Word: Corruption Robustness; Digital Pathology.
- Digest
  When designing a diagnostic model for a clinical application, it is crucial to guarantee the robustness of the model with respect to a wide range of image corruptions. Herein, an easy-to-use benchmark is established to evaluate how deep neural networks perform on corrupted pathology images. Specifically, corrupted images are generated by injecting nine types of common corruptions into validation images. Besides, two classification and one ranking metrics are designed to evaluate the prediction and confidence performance under corruption.
Towards out of distribution generalization for problems in mechanics. [paper]
- Lingxiao Yuan, Harold S. Park, Emma Lejeune.
- Key Word: Out-of-Distribution Generalization; Invariant Learning.
- Digest
  Out-of-distribution (OOD) generalization assumes that the test data may shift (i.e., violate the i.i.d. assumption). To date, multiple methods have been proposed to improve the OOD generalization of ML methods. However, because of the lack of benchmark datasets for OOD regression problems, the efficiency of these OOD methods on regression problems, which dominate the mechanics field, remains unknown. To address this, we investigate the performance of OOD generalization methods for regression problems in mechanics. Specifically, we identify three OOD problems: covariate shift, mechanism shift, and sampling bias. For each problem, we create two benchmark examples that extend the Mechanical MNIST dataset collection, and we investigate the performance of popular OOD generalization methods on these mechanics-specific regression problems.
Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head. [paper]
- Florian Bordes, Randall Balestriero, Quentin Garrido, Adrien Bardes, Pascal Vincent.
- Key Word: Pre-training; Self-Supervion; Fine-tuning; Regularization; Out-of-Distribution Generalization.
- Digest
  One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method, and using this network on downstream tasks but with its last few layers entirely removed. This usually skimmed-over trick is actually critical for SSL methods to display competitive performances. For example, on ImageNet classification, more than 30 points of percentage can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable form of regularization that has also been used to improve generalization performance in transfer learning scenarios. In this work, through theory and experiments, we formalize GR and identify the underlying reasons behind its success in SSL methods.
Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift. [paper]
- Christina Baek, Yiding Jiang, Aditi Raghunathan, Zico Kolter.
- Key Word: estimating Generalization Error; Distribution Shift.
- Digest
  Recently, Miller et al. showed that a model's in-distribution (ID) accuracy has a strong linear correlation with its out-of-distribution (OOD) accuracy on several OOD benchmarks -- a phenomenon they dubbed ''accuracy-on-the-line''. While a useful tool for model selection (i.e., the model most likely to perform the best OOD is the one with highest ID accuracy), this fact does not help estimate the actual OOD performance of models without access to a labeled OOD validation set. In this paper, we show a similar but surprising phenomenon also holds for the agreement between pairs of neural network classifiers: whenever accuracy-on-the-line holds, we observe that the OOD agreement between the predictions of any two pairs of neural networks (with potentially different architectures) also observes a strong linear correlation with their ID agreement.
Gated Domain Units for Multi-source Domain Generalization. [paper]
- Simon Föll, Alina Dubatovka, Eugen Ernst, Martin Maritsch, Patrik Okanovic, Gudrun Thäter, Joachim M. Buhmann, Felix Wortmann, Krikamol Muandet.
- Key Word: Multi-Source Domain Generalization; Invariant Elementary Distributions.
- Digest
  Distribution shift (DS) is a common problem that deteriorates the performance of learning machines. To overcome this problem, we postulate that real-world distributions are composed of elementary distributions that remain invariant across different domains. We call this an invariant elementary distribution (I.E.D.) assumption. This invariance thus enables knowledge transfer to unseen domains. To exploit this assumption in domain generalization (DG), we developed a modular neural network layer that consists of Gated Domain Units (GDUs). Each GDU learns an embedding of an individual elementary domain that allows us to encode the domain similarities during the training. During inference, the GDUs compute similarities between an observation and each of the corresponding elementary distributions which are then used to form a weighted ensemble of learning machines.
On Certifying and Improving Generalization to Unseen Domains. [paper] [code]
- Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Jihun Hamm.
- Key Word: Certified Domain Generalization; Distributionally Robust Optimization.
- Digest
  We demonstrate that the accuracy of the models trained with DG methods varies significantly across unseen domains, generated from popular benchmark datasets. This highlights that the performance of DG methods on a few benchmark datasets may not be representative of their performance on unseen domains in the wild. To overcome this roadblock, we propose a universal certification framework based on distributionally robust optimization (DRO) that can efficiently certify the worst-case performance of any DG method. This enables a data-independent evaluation of a DG method complementary to the empirical evaluations on benchmark datasets.
Out of distribution robustness with pre-trained Bayesian neural networks. [paper]
- Xi Wang, Laurence Aitchison.
- Key Word: Corruption Robustness; Pre-training; Bayesian Neural Networks.
- Digest
  We develop ShiftMatch, a new training-data-dependent likelihood for out of distribution (OOD) robustness in Bayesian neural networks (BNNs). ShiftMatch is inspired by the training-data-dependent "EmpCov" priors from Izmailov et al. (2021a) and efficiently matches test-time spatial correlations to those at training time. Critically, ShiftMatch is designed to leave neural network training unchanged, allowing it to use publically available samples from pretrained BNNs. Using pre-trained HMC samples, ShiftMatch gives strong performance improvements on CIFAR-10-C, outperforms EmpCov priors, and is perhaps the first Bayesian method capable of convincingly outperforming plain deep ensembles.
Invariant Causal Mechanisms through Distribution Matching. [paper]
- Mathieu Chevalley, Charlotte Bunne, Andreas Krause, Stefan Bauer.
- Key Word: Domain Generalization; Causal Inference.
- Digest
  Learning representations that capture the underlying data generating process is a key problem for data efficient and robust use of neural networks. One key property for robustness which the learned representation should capture and which recently received a lot of attention is described by the notion of invariance. In this work we provide a causal perspective and new algorithm for learning invariant representations. Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization, where we are able to significantly boost the score of existing models.
On Pre-Training for Federated Learning. [paper]
- Hong-You Chen, Cheng-Hao Tu, Ziwei Li, Han-Wei Shen, Wei-Lun Chao.
- Key Word: Pre-training; Federated Learning; Training with Sythetic Data.
- Digest
  In most of the literature on federated learning (FL), neural networks are initialized with random weights. In this paper, we present an empirical study on the effect of pre-training on FL. Specifically, we aim to investigate if pre-training can alleviate the drastic accuracy drop when clients' decentralized data are non-IID. We focus on FedAvg, the fundamental and most widely used FL algorithm. We found that pre-training does largely close the gap between FedAvg and centralized learning under non-IID data, but this does not come from alleviating the well-known model drifting problem in FedAvg's local training. Instead, how pre-training helps FedAvg is by making FedAvg's global aggregation more stable. When pre-training using real data is not feasible for FL, we propose a novel approach to pre-train with synthetic data.
Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming. [paper] [code]
- Chuan Wen, Jianing Qian, Jierui Lin, Jiaye Teng, Dinesh Jayaraman, Yang Gao. ICML 2022
- Key Word: Shortcut Removal; Out-of-Distribution Generalization.
- Digest
  We show empirically that DNNs can be coaxed to avoid poor shortcuts by providing an additional "priming" feature computed from key input features, usually a coarse output estimate. Priming relies on approximate domain knowledge of these task-relevant key input features, which is often easy to obtain in practical settings. For example, one might prioritize recent frames over past frames in a video input for visual imitation learning, or salient foreground over background pixels for image classification.
Mitigating Data Heterogeneity in Federated Learning with Data Augmentation. [paper]
- Artur Back de Luca, Guojun Zhang, Xi Chen, Yaoliang Yu.
- Key Word: Federated Learning; Domain Generalization; Data Augmentation.
- Digest
  While many approaches in DG tackle data heterogeneity from the algorithmic perspective, recent evidence suggests that data augmentation can induce equal or greater performance. Motivated by this connection, we present federated versions of popular DG algorithms, and show that by applying appropriate data augmentation, we can mitigate data heterogeneity in the federated setting, and obtain higher accuracy on unseen clients. Equipped with data augmentation, we can achieve state-of-the-art performance using even the most basic Federated Averaging algorithm, with much sparser communication.
How robust are pre-trained models to distribution shift? [paper]
- Yuge Shi, Imant Daunhawer, Julia E. Vogt, Philip H.S. Torr, Amartya Sanyal.
- Key Word: Distribution Shifts; Self-Supervised Pre-Trainig.
- Digest
  The vulnerability of machine learning models to spurious correlations has mostly been discussed in the context of supervised learning (SL). However, there is a lack of insight on how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE). In this work, we shed light on this by evaluating the performance of these models on both real world and synthetic distribution shift datasets. Following observations that the linear head itself can be susceptible to spurious correlations, we develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
Rectify ViT Shortcut Learning by Visual Saliency. [paper]
- Chong Ma, Lin Zhao, Yuzhong Chen, David Weizhong Liu, Xi Jiang, Tuo Zhang, Xintao Hu, Dinggang Shen, Dajiang Zhu, Tianming Liu.
- Key Word: Shortcut Learning; Vision Transformers; Eye Gaze Heatmap.
- Digest
  We propose a novel and effective saliency-guided vision transformer (SGT) model to rectify shortcut learning in ViT with the absence of eye-gaze data. Specifically, a computational visual saliency model is adopted to predict saliency maps for input image samples. Then, the saliency maps are used to distil the most informative image patches. In the proposed SGT, the self-attention among image patches focus only on the distilled informative ones.
GOOD: A Graph Out-of-Distribution Benchmark. [paper] [code]
- Shurui Gui, Xiner Li, Limei Wang, Shuiwang Ji.
- Key Word: Graph Neural Networks; Covariate Shifts; Concept Shifts.
- Digest
  Currently, there lacks a systematic benchmark tailored to graph OOD method evaluation. In this work, we aim at developing an OOD benchmark, known as GOOD, for graphs specifically. We explicitly make distinctions between covariate and concept shifts and design data splits that accurately reflect different shifts. We consider both graph and node prediction tasks as there are key differences when designing shifts. Overall, GOOD contains 8 datasets with 14 domain selections. When combined with covariate, concept, and no shifts, we obtain 42 different splits. We provide performance results on 7 commonly used baseline methods with 10 random runs. This results in 294 dataset-model combinations in total.
Modeling the Data-Generating Process is Necessary for Out-of-Distribution Generalization. [paper]
- Jivat Neet Kaur, Emre Kiciman, Amit Sharma.
- Key Word: Out-of-Distribution Generalization; Multi-attribute Distribution Shifts; Causal Graph.
- Digest
  Real-world data collected from multiple domains can have multiple, distinct distribution shifts over multiple attributes. However, state-of-the art advances in domain generalization (DG) algorithms focus only on specific shifts over a single attribute. We introduce datasets with multi-attribute distribution shifts and find that existing DG algorithms fail to generalize. To explain this, we use causal graphs to characterize the different types of shifts based on the relationship between spurious attributes and the classification label. Each multi-attribute causal graph entails different constraints over observed variables, and therefore any algorithm based on a single, fixed independence constraint cannot work well across all shifts. We present Causally Adaptive Constraint Minimization (CACM), a new algorithm for identifying the correct independence constraints for regularization.
What makes domain generalization hard? [paper]
- Spandan Madan, Li You, Mengmi Zhang, Hanspeter Pfister, Gabriel Kreiman.
- Key Word: Domain Generalization; Scene Context.
- Digest
  While several methodologies have been proposed for the daunting task of domain generalization, understanding what makes this task challenging has received little attention. Here we present SemanticDG (Semantic Domain Generalization): a benchmark with 15 photo-realistic domains with the same geometry, scene layout and camera parameters as the popular 3D ScanNet dataset, but with controlled domain shifts in lighting, materials, and viewpoints. Using this benchmark, we investigate the impact of each of these semantic shifts on generalization independently.
Pareto Invariant Risk Minimization. [paper]
- Yongqiang Chen, Kaiwen Zhou, Yatao Bian, Binghui Xie, Kaili Ma, Yonggang Zhang, Han Yang, Bo Han, James Cheng.
- Key Word: Invariant Learning; Multi-Task Learning.
- Digest
  Despite the success of invariant risk minimization (IRM) in tackling the Out-of-Distribution generalization problem, IRM can compromise the optimality when applied in practice. The practical variants of IRM, e.g., IRMv1, have been shown to have significant gaps with IRM and thus could fail to capture the invariance even in simple problems. Moreover, the optimization procedure in IRMv1 involves two intrinsically conflicting objectives, and often requires careful tuning for the objective weights. To remedy the above issues, we reformulate IRM as a multi-objective optimization problem, and propose a new optimization scheme for IRM, called PAreto Invariant Risk Minimization (PAIR).
Invariant Structure Learning for Better Generalization and Causal Explainability. [paper]
- Yunhao Ge, Sercan Ö. Arik, Jinsung Yoon, Ao Xu, Laurent Itti, Tomas Pfister.
- Key Word: Causal Structure Discovery; Explainability; Invariant Learning.
- Digest
  Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure discovery by utilizing generalization as an indication. ISL splits the data into different environments, and learns a structure that is invariant to the target across different environments by imposing a consistency constraint. An aggregation mechanism then selects the optimal classifier based on a graph structure that reflects the causal mechanisms in the data more accurately compared to the structures learnt from individual environments.
Causal Balancing for Domain Generalization. [paper]
- Xinyi Wang, Michael Saxon, Jiachen Li, Hongyang Zhang, Kun Zhang, William Yang Wang.
- Key Word: Invariant Learning; Causal Semantic Generative Model.
- Digest
  While current domain generalization methods usually focus on enforcing certain invariance properties across different domains by new loss function designs, we propose a balanced mini-batch sampling strategy to reduce the domain-specific spurious correlations in the observed training distributions. More specifically, we propose a two-phased method that 1) identifies the source of spurious correlations, and 2) builds balanced mini-batches free from spurious correlations by matching on the identified source.
GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing. [paper]
- Zhongkai Hao, Chengyang Ying, Yinpeng Dong, Hang Su, Jun Zhu, Jian Song. ICML 2022
- Key Word: Certified Semantic Robustness.
- Digest
  Existing methods are insufficient or unable to provably defend against semantic transformations, especially those without closed-form expressions (such as defocus blur and pixelate), which are more common in practice and often unrestricted. To fill up this gap, we propose generalized randomized smoothing (GSmooth), a unified theoretical framework for certifying robustness against general semantic transformations via a novel dimension augmentation strategy. Under the GSmooth framework, we present a scalable algorithm that uses a surrogate image-to-image network to approximate the complex transformation.
Sparse Fusion Mixture-of-Experts are Domain Generalizable Learners. [paper] [code]
- Bo Li, Jingkang Yang, Jiawei Ren, Yezhen Wang, Ziwei Liu.
- Key Word: Domain Generalization; Vision Transformer; Sparse Mixture-of-Experts.
- Digest
  We reveal the mixture-of-experts (MoE) model's generalizability on DG by leveraging to distributively handle multiple aspects of the predictive features across domains. To this end, we propose Sparse Fusion Mixture-of-Experts (SF-MoE), which incorporates sparsity and fusion mechanisms into the MoE framework to keep the model both sparse and predictive. SF-MoE has two dedicated modules: 1) sparse block and 2) fusion block, which disentangle and aggregate the diverse learned signals of an object, respectively.
Toward Certified Robustness Against Real-World Distribution Shifts. [paper]
- Haoze Wu, Teruhiro Tagomori, Alexander Robey, Fengjun Yang, Nikolai Matni, George Pappas, Hamed Hassani, Corina Pasareanu, Clark Barrett.
- Key Word: Certified Robustness; Distribution Shift.
- Digest
  We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement. The key idea is to "lazily" refine the abstraction of sigmoid functions to exclude spurious counter-examples found in the previous abstraction, thus guaranteeing progress in the verification process while keeping the state-space small.
Can CNNs Be More Robust Than Transformers? [paper] [code]
- Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie.
- Key Word: Transformers; Distribution Shift.
- Digest
  We question that belief by closely examining the design of Transformers. Our findings lead to three highly effective architecture designs for boosting robustness, yet simple enough to be implemented in several lines of code, namely a) patchifying input images, b) enlarging kernel size, and c) reducing activation layers and normalization layers.
Distributionally Invariant Learning: Rationalization and Practical Algorithms. [paper]
- Jiashuo Liu, Jiayun Wu, Jie Peng, Zheyan Shen, Bo Li, Peng Cui.
- Key Word: Invariant Learning.
- Digest
  We come up with the distributional invariance property as a relaxed alternative to the strict invariance, which considers the invariance only among sub-populations down to a prescribed scale and allows a certain degree of variation. We reformulate the invariant learning problem under latent heterogeneity into a relaxed form that pursues the distributional invariance, based on which we propose our novel Distributionally Invariant Learning (DIL) framework as well as two implementations named DIL-MMD and DIL-KL.
Generalized Federated Learning via Sharpness Aware Minimization. [paper]
- Zhe Qu, Xingyu Li, Rui Duan, Yao Liu, Bo Tang, Zhuo Lu. ICML 2022
- Key Word: Personalized Federated Learning.
- Digest
  We revisit the solutions to the distribution shift problem in FL with a focus on local learning generality. To this end, we propose a general, effective algorithm, FedSAM, based on Sharpness Aware Minimization (SAM) local optimizer, and develop a momentum FL algorithm to bridge local and global models, MoFedSAM. Theoretically, we show the convergence analysis of these two algorithms and demonstrate the generalization bound of FedSAM. Empirically, our proposed algorithms substantially outperform existing FL studies and significantly decrease the learning deviation.
An Optimal Transport Approach to Personalized Federated Learning. [paper] [code]
- Farzan Farnia, Amirhossein Reisizadeh, Ramtin Pedarsani, Ali Jadbabaie.
- Key Word: Personalized Federated Learning; Optimal Transport.
- Digest
  We focus on this problem and propose a novel personalized Federated Learning scheme based on Optimal Transport (FedOT) as a learning algorithm that learns the optimal transport maps for transferring data points to a common distribution as well as the prediction model under the applied transport map. To formulate the FedOT problem, we extend the standard optimal transport task between two probability distributions to multi-marginal optimal transport problems with the goal of transporting samples from multiple distributions to a common probability domain. We then leverage the results on multi-marginal optimal transport problems to formulate FedOT as a min-max optimization problem and analyze its generalization and optimization properties.
AugLoss: A Learning Methodology for Real-World Dataset Corruption. [paper]
- Kyle Otstot, John Kevin Cava, Tyler Sypherd, Lalitha Sankar.
- Key Word: Corruption Robustness; Data Augmentation.
- Digest
  As a step towards addressing both problems simultaneously, we introduce AugLoss, a simple but effective methodology that achieves robustness against both train-time noisy labeling and test-time feature distribution shifts by unifying data augmentation and robust loss functions. We conduct comprehensive experiments in varied settings of real-world dataset corruption to showcase the gains achieved by AugLoss compared to previous state-of-the-art methods.
Drawing out of Distribution with Neuro-Symbolic Generative Models. [paper]
- Yichao Liang, Joshua B. Tenenbaum, Tuan Anh Le, N. Siddharth.
- Key Word: Out-of-Distribution Generalization; Neuro-Symbolic Generative Models.
- Digest
  Learning general-purpose representations from perceptual inputs is a hallmark of human intelligence. For example, people can write out numbers or characters, or even draw doodles, by characterizing these tasks as different instantiations of the same generic underlying process -- compositional arrangements of different forms of pen strokes. Crucially, learning to do one task, say writing, implies reasonable competence at another, say drawing, on account of this shared process. We present Drawing out of Distribution (DooD), a neuro-symbolic generative model of stroke-based drawing that can learn such general-purpose representations. In contrast to prior work, DooD operates directly on images, requires no supervision or expensive test-time inference, and performs unsupervised amortised inference with a symbolic stroke model that better enables both interpretability and generalization.
On the Generalization of Wasserstein Robust Federated Learning. [paper]
- Tung-Anh Nguyen, Tuan Dung Nguyen, Long Tan Le, Canh T. Dinh, Nguyen H. Tran.
- Key Word: Wasserstein Distributionally Robust Optimization; Federated Learning.
- Digest
  In federated learning, participating clients typically possess non-i.i.d. data, posing a significant challenge to generalization to unseen distributions. To address this, we propose a Wasserstein distributionally robust optimization scheme called WAFL. Leveraging its duality, we frame WAFL as an empirical surrogate risk minimization problem, and solve it using a local SGD-based algorithm with convergence guarantees. We show that the robustness of WAFL is more general than related approaches, and the generalization bound is robust to all adversarial distributions inside the Wasserstein ball (ambiguity set).
Federated Learning under Distributed Concept Drift. [paper]
- Ellango Jothimurugesan, Kevin Hsieh, Jianyu Wang, Gauri Joshi, Phillip B. Gibbons.
- Key Word: Concept Drift; Federated Learning.
- Digest
  Our work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solutions to drift adaptation, with their single global model, are ill-suited to staggered drifts, necessitating multi-model solutions. We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering.
Evolving Domain Generalization. [paper]
- Wei Wang, Gezheng Xu, Ruizhi Pu, Jiaqi Li, Fan Zhou, Changjian Shui, Charles Ling, Christian Gagné, Boyu Wang.
- Key Word: Domain Generalization.
- Digest
  Domain generalization aims to learn a predictive model from multiple different but related source tasks that can generalize well to a target task without the need of accessing any target data. Existing domain generalization methods ignore the relationship between tasks, implicitly assuming that all the tasks are sampled from a stationary environment. Therefore, they can fail when deployed in an evolving environment. To this end, we formulate and study the \emph{evolving domain generalization} (EDG) scenario, which exploits not only the source data but also their evolving pattern to generate a model for the unseen task.
Evaluating Robustness to Dataset Shift via Parametric Robustness Sets. [paper]
- Nikolaj Thams, Michael Oberst, David Sontag.
- Key Word: Distributionally Robust Optimization.
- Digest
  We give a method for proactively identifying small, plausible shifts in distribution which lead to large differences in model performance. To ensure that these shifts are plausible, we parameterize them in terms of interpretable changes in causal mechanisms of observed variables. This defines a parametric robustness set of plausible distributions and a corresponding worst-case loss. While the loss under an individual parametric shift can be estimated via reweighting techniques such as importance sampling, the resulting worst-case optimization problem is non-convex, and the estimate may suffer from large variance.
PAC Generalisation via Invariant Representations. [paper]
- Advait Parulekar, Karthikeyan Shanmugam, Sanjay Shakkottai.
- Key Word: Invariant Learning; Causal Structure Learning; Domain Adaptation.
- Digest
  We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen SEMs? This larger collection of SEMs is generated through a parameterized family of interventions. Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds probabilistically over a family of linear SEMs without faithfulness assumptions.
The Missing Invariance Principle Found -- the Reciprocal Twin of Invariant Risk Minimization. [paper]
- Dongsung Huh, Avinash Baidya.
- Key Word: Invariant Learning.
- Digest
  We identify a fundamental flaw of IRM formulation that causes the failure. We then introduce a complementary notion of invariance, MRI, that is based on conserving the class-conditioned feature expectation across environments, that corrects for the flaw in IRM. Further, we introduce a simplified, practical version of the MRI formulation called as MRI-v1. We note that this constraint is convex which confers it with an advantage over the practical version of IRM, IRM-v1, which imposes non-convex constraints. We prove that in a general linear problem setting, MRI-v1 can guarantee invariant predictors given sufficient environments.
FL Games: A federated learning framework for distribution shifts. [paper]
- Sharut Gupta, Kartik Ahuja, Mohammad Havaei, Niladri Chatterjee, Yoshua Bengio.
- Key Word: Distribution Shifts; Federated Learning.
- Digest
  We argue that in order to generalize better across non-i.i.d. clients, it is imperative to only learn correlations that are stable and invariant across domains. We propose FL Games, a game-theoretic framework for federated learning for learning causal features that are invariant across clients. While training to achieve the Nash equilibrium, the traditional best response strategy suffers from high-frequency oscillations. We demonstrate that FL Games effectively resolves this challenge and exhibits smooth performance curves.
Federated Learning Aggregation: New Robust Algorithms with Guarantees. [paper]
- Adnan Ben Mansour, Gaia Carenini, Alexandre Duplessis, David Naccache.
- Key Word: Federated Learning; Model Aggregation.
- Digest
  We carry out a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework. From this, we derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.
Interpolating Compressed Parameter Subspaces. [paper]
- Siddhartha Datta, Nigel Shadbolt.
- Key Word: Distribution Shifts; Weight Averaging; Test-time distributions; Task interpolation.
- Digest
  Inspired by recent work on neural subspaces and mode connectivity, we revisit parameter subspace sampling for shifted and/or interpolatable input distributions (instead of a single, unshifted distribution). We enforce a compressed geometric structure upon a set of trained parameters mapped to a set of train-time distributions, denoting the resulting subspaces as Compressed Parameter Subspaces (CPS). We show the success and failure modes of the types of shifted distributions whose optimal parameters reside in the CPS. We find that ensembling point-estimates within a CPS can yield a high average accuracy across a range of test-time distributions, including backdoor, adversarial, permutation, stylization and rotation perturbations.
Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification. [paper]
- Leo Schwinn, Leon Bungert, An Nguyen, René Raab, Falk Pulsmeyer, Doina Precup, Björn Eskofier, Dario Zanca. ICML 2022
- Key Word: Decision Region Quantification; Corruption Robustness; Distribution Shift.
- Digest
  We propose the Decision Region Quantification (DRQ) algorithm to improve the robustness of any differentiable pre-trained model against both real-world and worst-case distribution shifts in the data. DRQ analyzes the robustness of local decision regions in the vicinity of a given data point to make more reliable predictions. We theoretically motivate the DRQ algorithm by showing that it effectively smooths spurious local extrema in the decision surface.
FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for Federated Learning on Non-IID Data. [paper] [code]
- Mike He Zhu, Léna Néhale Ezzine, Dianbo Liu, Yoshua Bengio.
- Key Word: Regularization; Federated Learning.
- Digest
  We propose the Federated Invariant Learning Consistency (FedILC) approach, which leverages the gradient covariance and the geometric mean of Hessians to capture both inter-silo and intra-silo consistencies of environments and unravel the domain shift problems in federated networks.
Causality Inspired Representation Learning for Domain Generalization. [paper] [code]
- Fangrui Lv, Jian Liang, Shuang Li, Bin Zang, Chi Harold Liu, Ziteng Wang, Di Liu. CVPR 2022
- Key Word: Domain Generalization; Causality.
- Digest
  We introduce a general structural causal model to formalize the DG problem. Specifically, we assume that each input is constructed from a mix of causal factors (whose relationship with the label is invariant across domains) and non-causal factors (category-independent), and only the former cause the classification judgments. Our goal is to extract the causal factors from inputs and then reconstruct the invariant causal mechanisms.
Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation. [paper]
- An Xu, Wenqi Li, Pengfei Guo, Dong Yang, Holger Roth, Ali Hatamizadeh, Can Zhao, Daguang Xu, Heng Huang, Ziyue Xu.
- Key Word: Personalized Federated Learning; Medical Image Segmentation.
- Digest
  We propose a novel training framework FedSM to avoid the client drift issue and successfully close the generalization gap compared with the centralized training for medical image segmentation tasks for the first time. We also propose a novel personalized FL objective formulation and a new method SoftPull to solve it in our proposed framework FedSM.
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. [paper] [code]
- Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt. ICML 2022
- Key Word: Wegiht Averaging; Out-of-Distribution Generalization.
- Digest
  We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet.
Continual Feature Selection: Spurious Features in Continual Learning. [paper]
- Timothée Lesort.
- Key Word: Spurious Correlations; Continual Learning.
- Digest
  This paper studies spurious features' influence on continual learning algorithms. We show that continual learning algorithms solve tasks by selecting features that are not generalizable. Our experiments highlight that continual learning algorithms face two related problems: (1) spurious features and (2) local spurious features. The first one is due to a covariate shift between training and testing data, while the second is due to the limited access to data at each training step. We study (1) through a consistent set of continual learning experiments varying spurious correlation amount and data distribution support. We show that (2) is a major cause of performance decrease in continual learning along with catastrophic forgetting.
Uncertainty Modeling for Out-of-Distribution Generalization. [paper] [code]
- Xiaotong Li, Yongxing Dai, Yixiao Ge, Jun Liu, Ying Shan, Ling-Yu Duan. ICLR 2022
- Key Word: Out-of-Distribution Generalization; Uncertainty.
- Digest
  We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training. Specifically, we hypothesize that the feature statistic, after considering the potential uncertainties, follows a multivariate Gaussian distribution. Hence, each feature statistic is no longer a deterministic value, but a probabilistic point with diverse distribution possibilities. With the uncertain feature statistics, the models can be trained to alleviate the domain perturbations and achieve better robustness against potential domain shifts.
Benchmarking and Analyzing Point Cloud Classification under Corruptions. [paper] [code]
- Jiawei Ren, Liang Pan, Ziwei Liu. ICML 2022
- Key Word: Corruption Robustness; Point Cloud Classification; Benchmarks.
- Digest
  3D perception, especially point cloud classification, has achieved substantial progress. However, in real-world deployment, point cloud corruptions are inevitable due to the scene complexity, sensor inaccuracy, and processing imprecision. In this work, we aim to rigorously benchmark and analyze point cloud classification under corruptions. To conduct a systematic investigation, we first provide a taxonomy of common 3D corruptions and identify the atomic corruptions. Then, we perform a comprehensive evaluation on a wide range of representative point cloud models to understand their robustness and generalizability.
Handling Distribution Shifts on Graphs: An Invariance Perspective. [paper] [code]
- Qitian Wu, Hengrui Zhang, Junchi Yan, David Wipf. ICLR 2022
- Key Word: Distribution Shifts; Graph Neural Networks.
- Digest
  We formulate the OOD problem on graphs and develop a new invariant learning approach, Explore-to-Extrapolate Risk Minimization (EERM), that facilitates graph neural networks to leverage invariance principles for prediction. EERM resorts to multiple context explorers (specified as graph structure editers in our case) that are adversarially trained to maximize the variance of risks from multiple virtual environments.
Certifying Out-of-Domain Generalization for Blackbox Functions. [paper]
- Maurice Weber, Linyi Li, Boxin Wang, Zhikuan Zhao, Bo Li, Ce Zhang.
- Key Word: Certified Distributional Robustness; Out-of-Distribution Generalization.
- Digest
  We focus on the problem of certifying distributional robustness for black box models and bounded losses, without other assumptions. We propose a novel certification framework given bounded distance of mean and variance of two distributions. Our certification technique scales to ImageNet-scale datasets, complex models, and a diverse range of loss functions. We then focus on one specific application enabled by such scalability and flexibility, i.e., certifying out-of-domain generalization for large neural networks and loss functions such as accuracy and AUC.
Provable Domain Generalization via Invariant-Feature Subspace Recovery. [paper] [code]
- Haoxiang Wang, Haozhe Si, Bo Li, Han Zhao. ICML 2022
- Key Word: Domain Generalization; Invariant Learning.
- Digest
  we propose to achieve domain generalization with Invariant-feature Subspace Recovery (ISR). Our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with ds+1 training environments under the data model of Rosenfeld et al. (2021). Our second algorithm, ISR-Cov, further reduces the required number of training environments to O(1) using the information of second-order moments.
Certifying Model Accuracy under Distribution Shifts. [paper]
- Aounon Kumar, Alexander Levine, Tom Goldstein, Soheil Feizi.
- Key Word: Certified Distributional Robustness; Corruption Robustness.
- Digest
  Certified robustness in machine learning has primarily focused on adversarial perturbations of the input with a fixed attack budget for each point in the data distribution. In this work, we present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation. Our framework allows the datum-specific perturbation size to vary across different points in the input distribution and is general enough to include fixed-sized perturbations as well.

Evasion Attacks and Defenses

ReMoS: Reducing Defect Inheritance in Transfer Learning via Relevant Model Slicing. [paper]
- Ziqi Zhang, Yuanchun Li, Jindong Wang , Bingyan Liu, Ding Li, Xiangqun Chen, Yao Guo, and Yunxin Liu. ICSE 2022
- Key Word: Attack defect inheritance; Transfer learning
- Digest
  We propose ReMoS, a relevant model slicing technique to reduce defect inheritance during transfer learning while retaining useful knowledge from the teacher model. Specifically, ReMoS computes a model slice (a subset of model weights) that is relevant to the student task based on the neuron coverage information obtained by profiling the teacher neurons.
Implicit Bias of Adversarial Training for Deep Neural Networks. [paper]
- Bochen Lv, Zhanxing Zhu. ICLR 2022
- Key Word: Adversarial Training.
- Digest
  We provide theoretical understandings of the implicit bias imposed by adversarial training for homogeneous deep neural networks without any explicit regularization. In particular, for deep linear networks adversarially trained by gradient descent on a linearly separable dataset, we prove that the direction of the product of weight matrices converges to the direction of the max-margin solution of the original dataset.
Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off. [paper] [code]
- Rahul Rade, Seyed-Mohsen Moosavi-Dezfooli. ICLR 2022
- Key Word: Adversarial Training.
- Digest
  We closely examine the changes induced in the decision boundary of a deep network during adversarial training. We find that adversarial training leads to unwarranted increase in the margin along certain adversarial directions, thereby hurting accuracy. Motivated by this observation, we present a novel algorithm, called Helper-based Adversarial Training (HAT), to reduce this effect by incorporating additional wrongly labelled examples during training.
Physically Adversarial Attacks and Defenses in Computer Vision: A Survey. [paper]
- Xingxing Wei, Bangzheng Pu, Jiefan Lu, Baoyuan Wu.
- Key Word: Physical Adversarial Attacks; Survey.
- Digest
  We summarize a survey versus the current physically adversarial attacks and physically adversarial defenses in computer vision. To establish a taxonomy, we organize the current physical attacks from attack tasks, attack forms, and attack methods, respectively. Thus, readers can have a systematic knowledge about this topic from different aspects. For the physical defenses, we establish the taxonomy from pre-processing, in-processing, and post-processing for the DNN models to achieve a full coverage of the adversarial defenses.
DensePure: Understanding Diffusion Models towards Adversarial Robustness. [paper]
- Chaowei Xiao, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, Dawn Song.
- Key Word: Diffusion Models; Adversarial Robustness.
- Digest
  Diffusion models have been recently employed to improve certified robustness through the process of denoising. However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions under which they can enhance certified robustness. This deeper understanding allows us to propose a new method DensePure, designed to improve the certified robustness of a pretrained model (i.e. classifier).
Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks. [paper]
- Jianan Zhou, Jianing Zhu, Jingfeng Zhang, Tongliang Liu, Gang Niu, Bo Han, Masashi Sugiyama. NeurIPS 2022
- Key Word: Complementary Learning; Adversarial Training.
- Digest
  To push AT towards more practical scenarios, we explore a brand new yet challenging setting, i.e., AT with complementary labels (CLs), which specify a class that a data sample does not belong to. However, the direct combination of AT with existing methods for CLs results in consistent failure, but not on a simple baseline of two-stage training. In this paper, we further explore the phenomenon and identify the underlying challenges of AT with CLs as intractable adversarial optimization and low-quality adversarial examples.
FI-ODE: Certified and Robust Forward Invariance in Neural ODEs. [paper] [code]
- Yujia Huang, Ivan Dario Jimenez Rodriguez, Huan Zhang, Yuanyuan Shi, Yisong Yue.
- Key Word: Certified Adversarial Robustness; Neural Ordinary Differential Equations.
- Digest
  We study how to certifiably enforce forward invariance properties in neural ODEs. Forward invariance implies that the hidden states of the ODE will stay in a ``good'' region, and a robust version would hold even under adversarial perturbations to the input. Such properties can be used to certify desirable behaviors such as adversarial robustness (the hidden states stay in the region that generates accurate classification even under input perturbations) and safety in continuous control (the system never leaves some safe set).
Universal Adversarial Directions. [paper]
- Ching Lam Choi, Farzan Farnia.
- Key Word: Universal Adversarial Perturbations.
- Digest
  We study the transferability of UAPs by analyzing equilibrium in the universal adversarial example game between the classifier and UAP adversary players. We show that under mild assumptions the universal adversarial example game lacks a pure Nash equilibrium, indicating UAPs' suboptimal transferability across DNN classifiers. To address this issue, we propose Universal Adversarial Directions (UADs) which only fix a universal direction for adversarial perturbations and allow the perturbations' magnitude to be chosen freely across samples.
Efficient and Effective Augmentation Strategy for Adversarial Training. [paper] [code]
- Sravanti Addepalli, Samyak Jain, R.Venkatesh Babu. NeurIPS 2022
- Key Word: Adversarial Training; Data Augmentation.
- Digest
  We first explain this contrasting behavior by viewing augmentation during training as a problem of domain generalization, and further propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training. We aim to handle the conflicting goals of enhancing the diversity of the training dataset and training with data that is close to the test distribution by using a combination of simple and complex augmentations with separate batch normalization layers during training.
Accelerating Certified Robustness Training via Knowledge Transfer. [paper]
- Pratik Vaishnavi, Kevin Eykholt, Amir Rahmati. NeurIPS 2022
- Key Word: Certified Adversarial Robustness; Transfer Learning.
- Digest
  We propose Certified Robustness Transfer (CRT), a general-purpose framework for reducing the computational overhead of any certifiably robust training method through knowledge transfer. Given a robust teacher, our framework uses a novel training loss to transfer the teacher's robustness to the student. We provide theoretical and empirical validation of CRT.
Hindering Adversarial Attacks with Implicit Neural Representations. [paper]
- Andrei A. Rusu, Dan A. Calian, Sven Gowal, Raia Hadsell.
- Key Word: Adversarial Attacks; Implicit Neural Representations.
- Digest
  We introduce the Lossy Implicit Network Activation Coding (LINAC) defence, an input transformation which successfully hinders several common adversarial attacks on CIFAR-10 classifiers for perturbations up to ϵ=8/255 in L∞ norm and ϵ=0.5 in L2 norm. Implicit neural representations are used to approximately encode pixel colour intensities in 2D images such that classifiers trained on transformed data appear to have robustness to small perturbations without adversarial training or large drops in performance.
Chaos Theory and Adversarial Robustness. [paper]
- Jonathan S. Kent.
- Key Word: Adversarial Robustness; Chaos Theory.
- Digest
  Neural Networks, being susceptible to adversarial attacks, should face a strict level of scrutiny before being deployed in critical or adversarial applications. This paper uses ideas from Chaos Theory to explain, analyze, and quantify the degree to which Neural Networks are susceptible to or robust against adversarial attacks. Our results show that susceptibility to attack grows significantly with the depth of the model, which has significant safety implications for the design of Neural Networks for production environments. We also demonstrate how to quickly and easily approximate the certified robustness radii for extremely large models, which until now has been computationally infeasible to calculate directly, as well as show a clear relationship between our new susceptibility metric and post-attack accuracy.
Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games. [paper]
- Maria-Florina Balcan, Rattana Pukdee, Pradeep Ravikumar, Hongyang Zhang.
- Key Word: Adversarial Training; Game Theory.
- Digest
  Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium.
Evolution of Neural Tangent Kernels under Benign and Adversarial Training. [paper] [code]
- Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus. NeurIPS 2022
- Key Word: Adversarial Training; Neural Tangent Kernel.
- Digest
  We perform an empirical study of the evolution of the empirical NTK under standard and adversarial training, aiming to disambiguate the effect of adversarial training on kernel learning and lazy training. We find under adversarial training, the empirical NTK rapidly converges to a different kernel (and feature map) than standard training.
LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness. [paper] [code]
- Xiaojun Xu, Linyi Li, Bo Li. NeurIPS 2022
- Key Word: Certified Adversarial Robustness.
- Digest
  We propose a layer-wise orthogonal training method (LOT) to effectively train 1-Lipschitz convolution layers via parametrizing an orthogonal matrix with an unconstrained matrix. We then efficiently compute the inverse square root of a convolution kernel by transforming the input domain to the Fourier frequency domain. On the other hand, as existing works show that semi-supervised training helps improve empirical robustness, we aim to bridge the gap and prove that semi-supervised learning also improves the certified robustness of Lipschitz-bounded models.
On the Adversarial Robustness of Mixture of Experts. [paper]
- Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme, Pranjal Awasthi, Srinadh Bhojanapalli.
- Key Word: Adversarial Robustness; Mixture of Experts.
- Digest
  Recently, Bubeck and Sellke proved a lower bound on the Lipschitz constant of functions that fit the training data in terms of their number of parameters. This raises an interesting open question, do -- and can -- functions with more parameters, but not necessarily more computational cost, have better robustness? We study this question for sparse Mixture of Expert models (MoEs), that make it possible to scale up the model size for a roughly constant computational cost. We theoretically show that under certain conditions on the routing and the structure of the data, MoEs can have significantly smaller Lipschitz constants than their dense counterparts.
Scaling Adversarial Training to Large Perturbation Bounds. [paper] [code]
- Sravanti Addepalli, Samyak Jain, Gaurang Sriramanan, R.Venkatesh Babu. ECCV 2022
- Key Word: Adversarial Training.
- Digest
  We aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. The presence of images that flip Oracle predictions and those that do not makes this a challenging setting for adversarial robustness. We discuss the ideal goals of an adversarial defense algorithm beyond perceptual limits, and further highlight the shortcomings of naively extending existing training algorithms to higher perturbation bounds. In order to overcome these shortcomings, we propose a novel defense, Oracle-Aligned Adversarial Training (OA-AT), to align the predictions of the network with that of an Oracle during adversarial training.
Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition. [paper]
- Shuai Jia, Bangjie Yin, Taiping Yao, Shouhong Ding, Chunhua Shen, Xiaokang Yang, Chao Ma. NeurIPS 2022
- Key Word: Adversarial Transferability; Face Recognition.
- Digest
  In this work, instead of performing perturbations on the low-level pixels, we propose to generate attacks through perturbing on the high-level semantics to improve attack transferability. Specifically, a unified flexible framework, Adversarial Attributes (Adv-Attribute), is designed to generate inconspicuous and transferable attacks on face recognition, which crafts the adversarial noise and adds it into different attributes based on the guidance of the difference in face recognition features from the target.
Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity. [paper] [code]
- Andrew C. Cullen, Paul Montague, Shijie Liu, Sarah M. Erfani, Benjamin I.P. Rubinstein. NeurIPS 2022
- Key Word: Certified Adversarial Robustness; Randomised Smoothing.
- Digest
  We demonstrate how today's "optimal" certificates can be improved by exploiting both the transitivity of certifications, and the geometry of the input space, giving rise to what we term Geometrically-Informed Certified Robustness. By considering the smallest distance to points on the boundary of a set of certifications this approach improves certifications for more than 80% of Tiny-Imagenet instances, yielding an on average 5% increase in the associated certification.
Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation. [paper] [code]
- Zeyu Qin, Yanbo Fan, Yi Liu, Li Shen, Yong Zhang, Jue Wang, Baoyuan Wu. NeurIPS 2022
- Key Word: Adversarial Transferability.
- Digest
  We propose a novel attack method, dubbed reverse adversarial perturbation (RAP). Specifically, instead of minimizing the loss of a single adversarial point, we advocate seeking adversarial example located at a region with unified low loss value, by injecting the worst-case perturbation (the reverse adversarial perturbation) for each step of the optimization procedure. The adversarial attack with RAP is formulated as a min-max bi-level optimization problem.
Robust Models are less Over-Confident. [paper] [code]
- Julia Grabinski, Paul Gavrikov, Janis Keuper, Margret Keuper. NeurIPS 2022
- Key Word: Adversarial Robustness; Confidence Calibration.
- Digest
  We empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions, even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences.
What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? [paper]
- Nikolaos Tsilivis, Julia Kempe. NeurIPS 2022
- Key Word: Adversarial Robustness; Neural Tangent Kernel.
- Digest
  We show how NTKs allow to generate adversarial examples in a ``training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the ''lazy'' regime. We leverage this connection to provide an alternative view on robust and non-robust features, which have been suggested to underlie the adversarial brittleness of neural nets. Specifically, we define and study features induced by the eigendecomposition of the kernel to better understand the role of robust and non-robust features, the reliance on both for standard classification and the robustness-accuracy trade-off.
ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints. [paper]
- Yinpeng Dong, Shouwei Ruan, Hang Su, Caixin Kang, Xingxing Wei, Jun Zhu. NeurIPS 2022
- Key Word: Adversarial Robustness; Robustness to 3D Variations; Novel View Synthesis; Neural Rendering.
- Digest
  We propose a novel method called ViewFool to find adversarial viewpoints that mislead visual recognition models. By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints under an entropic regularizer, which helps to handle the fluctuations of the real camera pose and mitigate the reality gap between the real objects and their neural representations.
Towards Out-of-Distribution Adversarial Robustness. [paper]
- Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan.
- Key Word: Adversarial Robustness; Out-of-Distribution Generalization.
- Digest
  Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fails to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different Lp norms, we show that there is potential for improvement against many commonly used attacks by adopting a domain generalisation approach. Concretely, we treat each type of attack as a domain, and apply the Risk Extrapolation method (REx), which promotes similar levels of robustness against all training attacks.
Pre-trained Adversarial Perturbations. [paper]
- Yuanhao Ban, Yinpeng Dong. NeurIPS 2022
- Key Word: Adversarial Robustness; Self-Supervision.
- Digest
  We delve into the robustness of pre-trained models by introducing Pre-trained Adversarial Perturbations (PAPs), which are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones without any knowledge of the downstream tasks. To this end, we propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models.
Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models. [paper] [code]
- Fan Liu, Hao Liu, Wenzhao Jiang. NeurIPS 2022
- Key Word: Spatiotemporal Traffic Forecasting; Adversarial Attack.
- Digest
  We investigate the vulnerability of spatiotemporal traffic forecasting models and propose a practical adversarial spatiotemporal attack framework. Specifically, instead of simultaneously attacking all geo-distributed data sources, an iterative gradient-guided node saliency method is proposed to identify the time-dependent set of victim nodes. Furthermore, we devise a spatiotemporal gradient descent based scheme to generate real-valued adversarial traffic states under a perturbation constraint. Meanwhile, we theoretically demonstrate the worst performance bound of adversarial traffic forecasting attacks.
On the Robustness of Deep Clustering Models: Adversarial Attacks and Defenses. [paper]
- Anshuman Chhabra, Ashwin Sekhari, Prasant Mohapatra. NeurIPS 2022
- Key Word: Deep Clustering; Adversarial Robustness.
- Digest
  While traditional clustering approaches have been analyzed from a robustness perspective, no prior work has investigated adversarial attacks and robustness for deep clustering models in a principled manner. To bridge this gap, we propose a blackbox attack using Generative Adversarial Networks (GANs) where the adversary does not know which deep clustering model is being used, but can query it for outputs. We analyze our attack against multiple state-of-the-art deep clustering models and real-world datasets, and find that it is highly successful. We then employ some natural unsupervised defense approaches, but find that these are unable to mitigate our attack.
Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks. [paper] [code]
- Shengming Yuan, Qilong Zhang, Lianli Gao, Yaya Cheng, Jingkuan Song. NeurIPS 2022
- Key Word: Unrestricted Color Attacks.
- Digest
  Unrestricted color attacks, which manipulate semantically meaningful color of an image, have shown their stealthiness and success in fooling both human eyes and deep neural networks. However, current works usually sacrifice the flexibility of the uncontrolled setting to ensure the naturalness of adversarial examples. As a result, the black-box attack performance of these methods is limited. To boost transferability of adversarial examples without damaging image quality, we propose a novel Natural Color Fool (NCF) which is guided by realistic color distributions sampled from a publicly available dataset and optimized by our neighborhood search and initialization reset.
Rethinking Lipschitz Neural Networks for Certified L-infinity Robustness. [paper]
- Bohang Zhang, Du Jiang, Di He, Liwei Wang. NeurIPS 2022
- Key Word: Certified L-infinity Robustness; Randomized Smoothing.
- Digest
  We derive two fundamental impossibility results that hold for any standard Lipschitz network: one for robust classification on finite datasets, and the other for Lipschitz function approximation. These results identify that networks built upon norm-bounded affine layers and Lipschitz activations intrinsically lose expressive power even in the two-dimensional case, and shed light on how recently proposed Lipschitz networks (e.g., GroupSort and ℓ∞-distance nets) bypass these impossibilities by leveraging order statistic functions.
Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop. [paper]
- Weixia Zhang, Dingquan Li, Xiongkuo Min, Guangtao Zhai, Guodong Guo, Xiaokang Yang, Kede Ma. NeurIPS 2022
- Key Word: Adversarial Robustness; No-reference Image Quality Assessment.
- Digest
  No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references. NR-IQA models are extensively studied in computational vision, and are widely used for performance evaluation and perceptual optimization of man-made vision systems. Here we make one of the first attempts to examine the perceptual robustness of NR-IQA models. Under a Lagrangian formulation, we identify insightful connections of the proposed perceptual attack to previous beautiful ideas in computer vision and machine learning.
Understanding Adversarial Robustness Against On-manifold Adversarial Examples. [paper]
- Jiancong Xiao, Liusha Yang, Yanbo Fan, Jue Wang, Zhi-Quan Luo.
- Key Word: Adversarial Robustness; On-Manifold Adversarial Examples.
- Digest
  We revisit the off-manifold assumption and want to study a question: at what level is the poor performance of neural networks against adversarial attacks due to on-manifold adversarial examples? Since the true data manifold is unknown in practice, we consider two approximated on-manifold adversarial examples on both real and synthesis datasets. On real datasets, we show that on-manifold adversarial examples have greater attack rates than off-manifold adversarial examples on both standard-trained and adversarially-trained models. On synthetic datasets, theoretically, We prove that on-manifold adversarial examples are powerful, yet adversarial training focuses on off-manifold directions and ignores the on-manifold adversarial examples.
Adaptive Weight Decay: On The Fly Weight Decay Tuning for Improving Robustness. [paper]
- Amin Ghiasi, Ali Shafahi, Reza Ardekani.
- Key Word: Weight Decay; Adversarial Robustness; Learning with Label Noise.
- Digest
  We introduce adaptive weight decay, which automatically tunes the hyper-parameter for weight decay during each training iteration. For classification problems, we propose changing the value of the weight decay hyper-parameter on the fly based on the strength of updates from the classification loss (i.e., gradient of cross-entropy), and the regularization loss (i.e., ℓ2-norm of the weights). We show that this simple modification can result in large improvements in adversarial robustness -- an area which suffers from robust overfitting -- without requiring extra data.
Your Out-of-Distribution Detection Method is Not Robust! [paper] [code]
- Mohammad Azizmalayeri, Arshia Soltani Moakhar, Arman Zarei, Reihaneh Zohrabi, Mohammad Taghi Manzuri, Mohammad Hossein Rohban. NeurIPS 2022
- Key Word: Out-of-Distribution Detection; Adversarial Robustness.
- Digest
  We re-examine these defenses against an end-to-end PGD attack on in/out data with larger perturbation sizes, e.g. up to commonly used ϵ=8/255 for the CIFAR-10 dataset. Surprisingly, almost all of these defenses perform worse than a random detection under the adversarial setting.
A Survey on Physical Adversarial Attack in Computer Vision. [paper]
- Donghua Wang, Wen Yao, Tingsong Jiang, Guijiang Tang, Xiaoqian Chen.
- Key Word: Physical Adversarial Attacks; Survey.
- Digest
  We review the development of physical adversarial attacks in DNN-based computer vision tasks, including image recognition tasks, object detection tasks, and semantic segmentation. For the sake of completeness of the algorithm evolution, we will briefly introduce the works that do not involve the physical adversarial attack.
GAMA: Generative Adversarial Multi-Object Scene Attacks. [paper]
- Abhishek Aich, Calvin Khang-Ta, Akash Gupta, Chengyu Song, Srikanth V. Krishnamurthy, M. Salman Asif, Amit K. Roy-Chowdhury.
- Key Word: Multi-object Scene based Generative Adversarial Attack; Multi-Modal Machine Learning.
- Digest
  This paper presents the first approach of using generative models for adversarial attacks on multi-object scenes. In order to represent the relationships between different objects in the input scene, we leverage upon the open-sourced pre-trained vision-language model CLIP (Contrastive Language-Image Pre-training), with the motivation to exploit the encoded semantics in the language space along with the visual space. We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA). GAMA demonstrates the utility of the CLIP model as an attacker's tool to train formidable perturbation generators for multi-object scenes.
Part-Based Models Improve Adversarial Robustness. [paper] [code]
- Chawin Sitawarin, Kornrapat Pongmala, Yizheng Chen, Nicholas Carlini, David Wagner.
- Key Word: Adversarial Robustness; Part-Based Models.
- Digest
  We believe that the richer form of annotation helps guide neural networks to learn more robust features without requiring more samples or larger models. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts and then classify the segmented object.
AdvDO: Realistic Adversarial Attacks for Trajectory Prediction. [paper]
- Yulong Cao, Chaowei Xiao, Anima Anandkumar, Danfei Xu, Marco Pavone.
- Key Word: Adversarial Attacks; Trajectory Prediciton.
- Digest
  Trajectory prediction is essential for autonomous vehicles (AVs) to plan correct and safe driving behaviors. While many prior works aim to achieve higher prediction accuracy, few study the adversarial robustness of their methods. To bridge this gap, we propose to study the adversarial robustness of data-driven trajectory prediction systems. We devise an optimization-based adversarial attack framework that leverages a carefully-designed differentiable dynamic model to generate realistic adversarial trajectories.
Enhance the Visual Representation via Discrete Adversarial Training. [paper] [code]
- Xiaofeng Mao, Yuefeng Chen, Ranjie Duan, Yao Zhu, Gege Qi, Shaokai Ye, Xiaodan Li, Rong Zhang, Hui Xue. NeurIPS 2022
- Key Word: Adversarial Training; Discrete Visual Representation Learning.
- Digest
  We propose Discrete Adversarial Training (DAT). DAT leverages VQGAN to reform the image data to discrete text-like inputs, i.e. visual words. Then it minimizes the maximal risk on such discrete images with symbolic adversarial perturbations. We further give an explanation from the perspective of distribution to demonstrate the effectiveness of DAT.
Explicit Tradeoffs between Adversarial and Natural Distributional Robustness. [paper]
- Mazda Moayeri, Kiarash Banihashem, Soheil Feizi. NeurIPS 2022
- Key Word: Natural Distributional Robustness; Adversarial Robustness; Spurious Correlations.
- Digest
  We bridge this gap and show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness. We first consider a simple linear regression setting on Gaussian data with disjoint sets of core and spurious features. In this setting, through theoretical and empirical analysis, we show that (i) adversarial training with ℓ1 and ℓ2 norms increases the model reliance on spurious features; (ii) For ℓ∞ adversarial training, spurious reliance only occurs when the scale of the spurious features is larger than that of the core features; (iii) adversarial training can have an unintended consequence in reducing distributional robustness, specifically when spurious correlations are changed in the new test domain.
A Light Recipe to Train Robust Vision Transformers. [paper] [code]
- Edoardo Debenedetti, Vikash Sehwag, Prateek Mittal.
- Key Word: Adversarially Robust Vision Transformers.
- Digest
  We ask whether Vision Transformers (ViTs) can serve as an underlying architecture for improving the adversarial robustness of machine learning models against evasion attacks. While earlier works have focused on improving Convolutional Neural Networks, we show that also ViTs are highly suitable for adversarial training to achieve competitive performance. We achieve this objective using a custom adversarial training recipe, discovered using rigorous ablation studies on a subset of the ImageNet dataset. The canonical training recipe for ViTs recommends strong data augmentation, in part to compensate for the lack of vision inductive bias of attention modules, when compared to convolutions.
Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization. [paper]
- Omar Montasser, Steve Hanneke, Nathan Srebro. NeurIPS 2022
- Key Word: Adversarial Robustness.
- Digest
  We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by Montasser, Hanneke, and Srebro (2019) and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to Haussler, Littlestone, and Warmuth (1994).
On the interplay of adversarial robustness and architecture components: patches, convolution and attention. [paper]
- Francesco Croce, Matthias Hein.
- Key Word: Adversarial Robustness.
- Digest
  In recent years novel architecture components for image classification have been developed, starting with attention and patches used in transformers. While prior works have analyzed the influence of some aspects of architecture components on the robustness to adversarial attacks, in particular for vision transformers, the understanding of the main factors is still limited. We compare several (non)-robust classifiers with different architectures and study their properties, including the effect of adversarial training on the interpretability of the learnt features and robustness to unseen threat models.
Certified Defences Against Adversarial Patch Attacks on Semantic Segmentation. [paper]
- Maksym Yatsura, Kaspar Sakmann, N. Grace Hua, Matthias Hein, Jan Hendrik Metze.
- Key Word: Certified Recovery; Certified Detection; Image Reconstruction.
- Digest
  Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present Demasked Smoothing, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. Previous work on certifiably defending against patch attacks has mostly focused on image classification task and often required changes in the model architecture and additional training which is undesirable and computationally expensive. In Demasked Smoothing, any segmentation model can be applied without particular training, fine-tuning, or restriction of the architecture.
Adversarial Coreset Selection for Efficient Robust Training. [paper]
- Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. ECCV 2022
- Key Word: Adversarial Training; Coreset Selection.
- Digest
  By leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. To this end, we first provide convergence guarantees for adversarial coreset selection. In particular, we show that the convergence bound is directly related to how well our coresets can approximate the gradient computed over the entire training data. Motivated by our theoretical analysis, we propose using this gradient approximation error as our adversarial coreset selection objective to reduce the training set size effectively.
The Space of Adversarial Strategies. [paper]
- Ryan Sheatsley, Blaine Hoak, Eric Pauley, Patrick McDaniel. USENIX Security 2022
- Key Word: Adversarial Attacks; Benchmark.
- Digest
  We propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance.
Securing the Spike: On the Transferabilty and Security of Spiking Neural Networks to Adversarial Examples. [paper]
- Key Word: Spiking Neural Networks; Transferable Adversarail Examples.
- Digest
  We advance the field of adversarial machine learning through experimentation and analyses of three important SNN security attributes. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient technique. Second, we analyze the transferability of adversarial examples generated by SNNs and other state-of-the-art architectures like Vision Transformers and Big Transfer CNNs. We demonstrate that SNNs are not often deceived by adversarial examples generated by Vision Transformers and certain types of CNNs.
Bag of Tricks for FGSM Adversarial Training. [paper] [code]
- Zichao Li, Li Liu, Zeyu Wang, Yuyin Zhou, Cihang Xie.
- Key Word: Fast Adversarial Training.
- Digest
  Existing methods use gradient regularizers or random initialization tricks to attenuate this issue, whereas they either take high computational cost or lead to lower robust accuracy. In this work, we provide the first study, which thoroughly examines a collection of tricks from three perspectives: Data Initialization, Network Structure, and Optimization, to overcome the catastrophic overfitting in FGSM-AT. Surprisingly, we find that simple tricks, i.e., a) masking partial pixels (even without randomness), b) setting a large convolution stride and smooth activation functions, or c) regularizing the weights of the first convolutional layer, can effectively tackle the overfitting issue.
A Novel Plug-and-Play Approach for Adversarially Robust Generalization. [paper]
- Deepak Maurya, Adarsh Barik, Jean Honorio.
- Key Word: Adversarial Robustness.
- Digest
  Our main focus is to provide a plug-and-play solution that can be incorporated in the existing machine learning algorithms with minimal changes. To that end, we derive the closed-form ready-to-use solution for several widely used loss functions with a variety of norm constraints on adversarial perturbation. Finally, we validate our approach by showing significant performance improvement on real-world datasets for supervised problems such as regression and classification, as well as for unsupervised problems such as matrix completion and learning graphical models, with very little computational overhead.
Adversarial Attacks on Image Generation With Made-Up Words. [paper]
- Raphaël Millière.
- Key Word: Adversarial Attacks; Text-Guided Image Generation; Prompting.
- Digest
  Text-guided image generation models can be prompted to generate images using nonce words adversarially designed to robustly evoke specific visual concepts. Two approaches for such generation are introduced: macaronic prompting, which involves designing cryptic hybrid words by concatenating subword units from different languages; and evocative prompting, which involves designing nonce words whose broad morphological features are similar enough to that of existing words to trigger robust visual associations. The two methods can also be combined to generate images associated with more specific visual concepts. The implications of these techniques for the circumvention of existing approaches to content moderation, and particularly the generation of offensive or harmful images, are discussed.
Federated Adversarial Learning: A Framework with Convergence Analysis. [paper]
- Xiaoxiao Li, Zhao Song, Jiaming Yang.
- Key Word: Federated Learning; Adversarial Robustness; Convergence via Over-parameterization.
- Digest
  We formulate a general form of federated adversarial learning (FAL) that is adapted from adversarial learning in the centralized setting. On the client side of FL training, FAL has an inner loop to generate adversarial samples for adversarial training and an outer loop to update local model parameters. On the server side, FAL aggregates local model updates and broadcast the aggregated model. We design a global robust training loss and formulate FAL training as a min-max optimization problem. Unlike the convergence analysis in classical centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for three reasons: 1) the complexity of min-max optimization, 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation and 3) inter-client heterogeneity. We address these challenges by using appropriate gradient approximation and coupling techniques and present the convergence analysis in the over-parameterized regime.
Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem. [paper] [code]
- Zheng Wang, Wenjie Ruan. ECML-PKDD 2022
- Key Word: Vision Transformers; Cauchy Problem; Adversarial Robustness.
- Digest
  We aim to introduce a principled and unified theoretical framework to investigate such an argument on ViT's robustness. We first theoretically prove that, unlike Transformers in Natural Language Processing, ViTs are Lipschitz continuous. Then we theoretically analyze the adversarial robustness of ViTs from the perspective of the Cauchy Problem, via which we can quantify how the robustness propagates through layers.
Is current research on adversarial robustness addressing the right problem? [paper]
- Ali Borji.
- Key Word: Adversarial Robustness; Out-of-Distribution Generalization.
- Digest
  Short answer: Yes, Long answer: No! Indeed, research on adversarial robustness has led to invaluable insights helping us understand and explore different aspects of the problem. Many attacks and defenses have been proposed over the last couple of years. The problem, however, remains largely unsolved and poorly understood. Here, I argue that the current formulation of the problem serves short term goals, and needs to be revised for us to achieve bigger gains. Specifically, the bound on perturbation has created a somewhat contrived setting and needs to be relaxed. This has misled us to focus on model classes that are not expressive enough to begin with. Instead, inspired by human vision and the fact that we rely more on robust features such as shape, vertices, and foreground objects than non-robust features such as texture, efforts should be steered towards looking for significantly different classes of models.
LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity. [paper] [code]
- Martin Gubri, Maxime Cordy, Mike Papadakis, Yves Le Traon, Koushik Sen. ECCV 2022
- Key Word: Adversarial Transferability.
- Digest
  We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability.
Improving Adversarial Robustness via Mutual Information Estimation. [paper] [code]
- Dawei Zhou, Nannan Wang, Xinbo Gao, Bo Han, Xiaoyu Wang, Yibing Zhan, Tongliang Liu. ICML 2022
- Key Word: Mutual information; Adversarial Robustness.
- Digest
  We investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively.
Can we achieve robustness from data alone? [paper]
- Nikolaos Tsilivis, Jingtong Su, Julia Kempe.
- Key Word: Dataset Distillation; Distributionally Robust Optimization; Adversarial Augmentation; Adversarial Robustness.
- Digest
  We devise a meta-learning method for robust classification, that optimizes the dataset prior to its deployment in a principled way, and aims to effectively remove the non-robust parts of the data. We cast our optimization method as a multi-step PGD procedure on kernel regression, with a class of kernels that describe infinitely wide neural nets (Neural Tangent Kernels - NTKs).
Proving Common Mechanisms Shared by Twelve Methods of Boosting Adversarial Transferability. [paper]
- Quanshi Zhang, Xin Wang, Jie Ren, Xu Cheng, Shuyun Lin, Yisen Wang, Xiangming Zhu.
- Key Word: Adversarial Transferability; Interaction.
- Digest
  This paper summarizes the common mechanism shared by twelve previous transferability-boosting methods in a unified view, i.e., these methods all reduce game-theoretic interactions between regional adversarial perturbations. To this end, we focus on the attacking utility of all interactions between regional adversarial perturbations, and we first discover and prove the negative correlation between the adversarial transferability and the attacking utility of interactions.
Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift. [paper]
- Ananya Kumar, Tengyu Ma, Percy Liang, Aditi Raghunathan. UAI 2022
- Key Word: Calibration; Distribution Shift.
- Digest
  We find that ID-calibrated ensembles -- where we simply ensemble the standard and robust models after calibrating on only ID data -- outperforms prior state-of-the-art (based on self-training) on both ID and OOD accuracy. On eleven natural distribution shift datasets, ID-calibrated ensembles obtain the best of both worlds: strong ID accuracy and OOD accuracy. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well both ID and OOD: (1) we need to calibrate the standard and robust models (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.
Prior-Guided Adversarial Initialization for Fast Adversarial Training. [paper] [code]
- Xiaojun Jia, Yong Zhang, Xingxing Wei, Baoyuan Wu, Ke Ma, Jue Wang, Xiaochun Cao. ECCV 2022
- Key Word: Fast Adversarial Training; Regularization.
- Digest
  We explore the difference between the training processes of SAT and FAT and observe that the attack success rate of adversarial examples (AEs) of FAT gets worse gradually in the late training stage, resulting in overfitting. The AEs are generated by the fast gradient sign method (FGSM) with a zero or random initialization. Based on the observation, we propose a prior-guided FGSM initialization method to avoid overfitting after investigating several initialization strategies, improving the quality of the AEs during the whole training process.
Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal. [paper] [code]
- Xinwei Liu, Jian Liu, Yang Bai, Jindong Gu, Tao Chen, Xiaojun Jia, Xiaochun Cao. ECCV 2022
- Key Word: Adversarial Attacks; Visible Watermark Removal; Watermark Protection.
- Digest
  As a common security tool, visible watermarking has been widely applied to protect copyrights of digital images. However, recent works have shown that visible watermarks can be removed by DNNs without damaging their host images. Such watermark-removal techniques pose a great threat to the ownership of images. Inspired by the vulnerability of DNNs on adversarial perturbations, we propose a novel defence mechanism by adversarial machine learning for good. From the perspective of the adversary, blind watermark-removal networks can be posed as our target models; then we actually optimize an imperceptible adversarial perturbation on the host images to proactively attack against watermark-removal networks, dubbed Watermark Vaccine.
Adversarially-Aware Robust Object Detector. [paper] [code]
- Ziyi Dong, Pengxu Wei, Liang Lin. ECCV 2022
- Key Word: Adversarial Robustness; Object Detection.
- Digest
  We empirically explore the model training for adversarial robustness in object detection, which greatly attributes to the conflict between learning clean images and adversarial images. To mitigate this issue, we propose a Robust Detector (RobustDet) based on adversarially-aware convolution to disentangle gradients for model learning on clean and adversarial images. RobustDet also employs the Adversarial Image Discriminator (AID) and Consistent Features with Reconstruction (CFR) to ensure a reliable robustness.
Frequency Domain Model Augmentation for Adversarial Attack. [paper] [code]
- Yuyang Long, Qilong Zhang, Boheng Zeng, Lianli Gao, Xianglong Liu, Jian Zhang, Jingkuan Song. ECCV 2022
- Key Word: Frequency; Adversarial Attacks.
- Digest
  For black-box attacks, the gap between the substitute model and the victim model is usually large, which manifests as a weak attack performance. Motivated by the observation that the transferability of adversarial examples can be improved by attacking diverse models simultaneously, model augmentation methods which simulate different models by using transformed images are proposed. However, existing transformations for spatial domain do not translate to significantly diverse augmented models. To tackle this issue, we propose a novel spectrum simulation attack to craft more transferable adversarial examples against both normally trained and defense models.
Not all broken defenses are equal: The dead angles of adversarial accuracy. [paper]
- Raphael Olivier, Bhiksha Raj.
- Key Word: Adversarial Defenses.
- Digest
  Many defenses, when evaluated against a strong attack, do not provide accuracy improvements while still contributing partially to adversarial robustness. Popular certification methods suffer from the same issue, as they provide a lower bound to accuracy. To capture finer robustness properties we propose a new metric for L2 robustness, adversarial angular sparsity, which partially answers the question "how many adversarial examples are there around an input". We demonstrate its usefulness by evaluating both "strong" and "weak" defenses. We show that some state-of-the-art defenses, delivering very similar accuracy, can have very different sparsity on the inputs that they are not robust on. We also show that some weak defenses actually decrease robustness, while others strengthen it in a measure that accuracy cannot capture.
Demystifying the Adversarial Robustness of Random Transformation Defenses. [paper] [code]
- Chawin Sitawarin, Zachary Golan-Strieb, David Wagner. ICML 2022
- Key Word: Adversarial Defenses; Random Transformation.
- Digest
  Defenses using random transformations (RT) have shown impressive results, particularly BaRT (Raff et al., 2019) on ImageNet. However, this type of defense has not been rigorously evaluated, leaving its robustness properties poorly understood. Their stochastic properties make evaluation more challenging and render many proposed attacks on deterministic models inapplicable. First, we show that the BPDA attack (Athalye et al., 2018a) used in BaRT's evaluation is ineffective and likely overestimates its robustness. We then attempt to construct the strongest possible RT defense through the informed selection of transformations and Bayesian optimization for tuning their parameters. Furthermore, we create the strongest possible attack to evaluate our RT defense.
Removing Batch Normalization Boosts Adversarial Training. [paper] [code]
- Key Word: Batch Normalization; Adversarial Training.
- Digest
  Our normalizer-free robust training (NoFrost) method extends recent advances in normalizer-free networks to AT for its unexplored advantage on handling the mixture distribution challenge. We show that NoFrost achieves adversarial robustness with only a minor sacrifice on clean sample accuracy. On ImageNet with ResNet50, NoFrost achieves 74.06% clean accuracy, which drops merely 2.00% from standard training. In contrast, BN-based AT obtains 59.28% clean accuracy, suffering a significant 16.78% drop from standard training.
Efficient Adversarial Training With Data Pruning. [paper]
- Maximilian Kaufmann, Yiren Zhao, Ilia Shumailov, Robert Mullins, Nicolas Papernot.
- Key Word: Adversarial Training; Data Pruning.
- Digest
  We demonstrate data pruning-a method for increasing adversarial training efficiency through data sub-sampling.We empirically show that data pruning leads to improvements in convergence and reliability of adversarial training, albeit with different levels of utility degradation. For example, we observe that using random sub-sampling of CIFAR10 to drop 40% of data, we lose 8% adversarial accuracy against the strongest attackers, while by using only 20% of data we lose 14% adversarial accuracy and reduce runtime by a factor of 3. Interestingly, we discover that in some settings data pruning brings benefits from both worlds-it both improves adversarial accuracy and training time.
Adversarial Robustness is at Odds with Lazy Training. [paper]
- Yunjuan Wang, Enayat Ullah, Poorya Mianjy, Raman Arora.
- Key Word: Adversarial Robustness; Lazy Training.
- Digest
  Recent works show that random neural networks are vulnerable against adversarial attacks [Daniely and Schacham, 2020] and that such attacks can be easily found using a single step of gradient descent [Bubeck et al., 2021]. In this work, we take it one step further and show that a single gradient step can find adversarial examples for networks trained in the so-called lazy regime. This regime is interesting because even though the neural network weights remain close to the initialization, there exist networks with small generalization error, which can be found efficiently using first-order methods. Our work challenges the model of the lazy regime, the dominant regime in which neural networks are provably efficiently learnable. We show that the networks trained in this regime, even though they enjoy good theoretical computational guarantees, remain vulnerable to adversarial examples.
Increasing Confidence in Adversarial Robustness Evaluations. [paper]
- Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini.
- Key Word: Adversarial Robustness.
- Digest
  Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations. However, only a handful of these defenses held up their claims because correctly evaluating robustness is extremely challenging: Weak attacks often fail to find adversarial examples even if they unknowingly exist, thereby making a vulnerable network look robust. In this paper, we propose a test to identify weak attacks, and thus weak defense evaluations. Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample. Consequentially, any correct attack must succeed in breaking this modified network.
Defending Multimodal Fusion Models against Single-Source Adversaries. [paper]
- Karren Yang, Wan-Yi Lin, Manash Barman, Filipe Condessa, Zico Kolter. CVPR 2021
- Key Word: Adversarial Robustness; Multimodal Fusion Models.
- Digest
  We investigate the robustness of multimodal neural networks against worst-case (i.e., adversarial) perturbations on a single modality. We first show that standard multimodal fusion models are vulnerable to single-source adversaries: an attack on any single modality can overcome the correct information from multiple unperturbed modalities and cause the model to fail. This surprising vulnerability holds across diverse multimodal tasks and necessitates a solution. Motivated by this finding, we propose an adversarially robust fusion strategy that trains the model to compare information coming from all the input sources, detect inconsistencies in the perturbed modality compared to the other modalities, and only allow information from the unperturbed modalities to pass through.
Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective. [paper]
- Mark Huasong Meng, Guangdong Bai, Sin Gee Teo, Zhe Hou, Yan Xiao, Yun Lin, Jin Song Dong.
- Key Word: Adversarial Robustness; Survey.
- Digest
  We survey existing literature in adversarial robustness verification for neural networks and collect 39 diversified research works across machine learning, security, and software engineering domains. We systematically analyze their approaches, including how robustness is formulated, what verification techniques are used, and the strengths and limitations of each technique. We provide a taxonomy from a formal verification perspective for a comprehensive understanding of this topic. We classify the existing techniques based on property specification, problem reduction, and reasoning strategies.
Measuring Representational Robustness of Neural Networks Through Shared Invariances. [paper] [code]
- Vedant Nanda, Till Speicher, Camila Kolling, John P. Dickerson, Krishna P. Gummadi, Adrian Weller. ICML 2022
- Key Word: Representational Similarity; Adversarial Robustness.
- Digest
  A major challenge in studying robustness in deep learning is defining the set of ``meaningless'' perturbations to which a given Neural Network (NN) should be invariant. Most work on robustness implicitly uses a human as the reference model to define such perturbations. Our work offers a new view on robustness by using another reference NN to define the set of perturbations a given NN should be invariant to, thus generalizing the reliance on a reference ``human NN'' to any NN. This makes measuring robustness equivalent to measuring the extent to which two NNs share invariances, for which we propose a measure called STIR. STIR re-purposes existing representation similarity measures to make them suitable for measuring shared invariances.
Adversarially trained neural representations may already be as robust as corresponding biological neural representations. [paper]
- Chong Guo, Michael J. Lee, Guillaume Leclerc, Joel Dapello, Yug Rao, Aleksander Madry, James J. DiCarlo.
- Key Word: Adversarial Robustness; Biological Neural Representation.
- Digest
  We develop a method for performing adversarial visual attacks directly on primate brain activity. We then leverage this method to demonstrate that the above-mentioned belief might not be well founded. Specifically, we report that the biological neurons that make up visual systems of primates exhibit susceptibility to adversarial perturbations that is comparable in magnitude to existing (robustly trained) artificial neural networks.
Guided Diffusion Model for Adversarial Purification from Random Noise. [paper]
- Quanlin Wu, Hang Ye, Yuntian Gu.
- Key Word: Adversarial Purification; Diffusion Model.
- Digest
  In this paper, we propose a novel guided diffusion purification approach to provide a strong defense against adversarial attacks. Our model achieves 89.62% robust accuracy under PGD-L_inf attack (eps = 8/255) on the CIFAR-10 dataset. We first explore the essential correlations between unguided diffusion models and randomized smoothing, enabling us to apply the models to certified robustness. The empirical results show that our models outperform randomized smoothing by 5% when the certified L2 radius r is larger than 0.5.
Robust Universal Adversarial Perturbations. [paper]
- Changming Xu, Gagandeep Singh.
- Key Word: Transferable Adversarial Example; Universal Adversarial Perturbations.
- Digest
  We introduce a new concept and formulation of robust universal adversarial perturbations. Based on our formulation, we build a novel, iterative algorithm that leverages probabilistic robustness bounds for generating UAPs robust against transformations generated by composing arbitrary sub-differentiable transformation functions.
(Certified!!) Adversarial Robustness for Free! [paper]
- Nicholas Carlini, Florian Tramer, Krishnamurthy (Dj)Dvijotham, J. Zico Kolter.
- Key Word: Certified Adversarial Robustness; Randomized Smoothing; Diffusion Models.
- Digest
  In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models. To do so, we instantiate the denoised smoothing approach of Salman et al. by combining a pretrained denoising diffusion probabilistic model and a standard high-accuracy classifier. This allows us to certify 71% accuracy on ImageNet under adversarial perturbations constrained to be within a 2-norm of 0.5, an improvement of 14 percentage points over the prior certified SoTA using any approach, or an improvement of 30 percentage points over denoised smoothing. We obtain these results using only pretrained diffusion models and image classifiers, without requiring any fine tuning or retraining of model parameters.
Understanding Robust Learning through the Lens of Representation Similarities. [paper] [code]
- Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Ben Zhao, Prateek Mittal.
- Key Word: Adversarial Robustness; Representation Similarity.
- Digest
  We aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training. This is critical to diagnosing numerous salient pitfalls in robust networks, such as, degradation of performance on benign inputs, poor generalization of robustness, and increase in over-fitting. We utilize a powerful set of tools known as representation similarity metrics, across three vision datasets, to obtain layer-wise comparisons between robust and non-robust DNNs with different architectures, training procedures and adversarial constraints.
Diversified Adversarial Attacks based on Conjugate Gradient Method. [paper] [code]
- Keiichiro Yamamura, Haruki Sato, Nariaki Tateiwa, Nozomi Hata, Toru Mitsutake, Issa Oe, Hiroki Ishikura, Katsuki Fujisawa. ICML 2022
- Key Word: Adversarial Attacks.
- Digest
  Although existing methods based on the steepest descent have achieved high attack success rates, ill-conditioned problems occasionally reduce their performance. To address this limitation, we utilize the conjugate gradient (CG) method, which is effective for this type of problem, and propose a novel attack algorithm inspired by the CG method, named the Auto Conjugate Gradient (ACG) attack. The results of large-scale evaluation experiments conducted on the latest robust models show that, for most models, ACG was able to find more adversarial examples with fewer iterations than the existing SOTA algorithm Auto-PGD (APGD).
On the Role of Generalization in Transferability of Adversarial Examples. [paper]
- Yilin Wang, Farzan Farnia.
- Key Word: Transferable Adversarial Example.
- Digest
  We aim to demonstrate the role of the generalization properties of the substitute classifier used for generating adversarial examples in the transferability of the attack scheme to unobserved NN classifiers. To do this, we apply the max-min adversarial example game framework and show the importance of the generalization properties of the substitute NN in the success of the black-box attack scheme in application to different NN classifiers. We prove theoretical generalization bounds on the difference between the attack transferability rates on training and test samples.
Understanding Robust Overfitting of Adversarial Training and Beyond. [paper] [code]
- Chaojian Yu, Bo Han, Li Shen, Jun Yu, Chen Gong, Mingming Gong, Tongliang Liu. ICML 2022
- Key Word: Adversarial Training; Robust Overfitting.
- Digest
  Robust overfitting widely exists in adversarial training of deep networks. The exact underlying reasons for this are still not completely understood. Here, we explore the causes of robust overfitting by comparing the data distribution of non-overfit (weak adversary) and overfitted (strong adversary) adversarial training, and observe that the distribution of the adversarial data generated by weak adversary mainly contain small-loss data. However, the adversarial data generated by strong adversary is more diversely distributed on the large-loss data and the small-loss data. Given these observations, we further designed data ablation adversarial training and identify that some small-loss data which are not worthy of the adversary strength cause robust overfitting in the strong adversary mode. To relieve this issue, we propose minimum loss constrained adversarial training (MLCAT): in a minibatch, we learn large-loss data as usual, and adopt additional measures to increase the loss of the small-loss data.
Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization. [paper]
- Deokjae Lee, Seungyong Moon, Junhyeok Lee, Hyun Oh Song. ICML 2022
- Key Word: Black-Box Adversarial Attacks.
- Digest
  Existing black-box attacks, mostly based on greedy algorithms, find adversarial examples using pre-computed key positions to perturb, which severely limits the search space and might result in suboptimal solutions. To this end, we propose a query-efficient black-box attack using Bayesian optimization, which dynamically computes important positions using an automatic relevance determination (ARD) categorical kernel. We introduce block decomposition and history subsampling techniques to improve the scalability of Bayesian optimization when an input sequence becomes long.
Adversarial Patch Attacks and Defences in Vision-Based Tasks: A Survey. [paper]
- Abhijith Sharma, Yijun Bian, Phil Munz, Apurva Narayan.
- Key Word: Adversarial Patach Attacks and Defenses; Survey.
- Digest
  Adversarial attacks in deep learning models, especially for safety-critical systems, are gaining more and more attention in recent years, due to the lack of trust in the security and robustness of AI models. Yet the more primitive adversarial attacks might be physically infeasible or require some resources that are hard to access like the training data, which motivated the emergence of patch attacks. In this survey, we provide a comprehensive overview to cover existing techniques of adversarial patch attacks, aiming to help interested researchers quickly catch up with the progress in this field. We also discuss existing techniques for developing detection and defences against adversarial patches, aiming to help the community better understand this field and its applications in the real world.
Catastrophic overfitting is a bug but also a feature. [paper] [code]
- Guillermo Ortiz-Jiménez, Pau de Jorge, Amartya Sanyal, Adel Bibi, Puneet K. Dokania, Pascal Frossard, Gregory Rogéz, Philip H.S. Torr.
- Key Word: Adversarial Robustness; Robust Overfitting.
- Digest
  We find that the interplay between the structure of the data and the dynamics of AT plays a fundamental role in CO. Specifically, through active interventions on typical datasets of natural images, we establish a causal link between the structure of the data and the onset of CO in single-step AT methods. This new perspective provides important insights into the mechanisms that lead to CO and paves the way towards a better understanding of the general dynamics of robust model construction.
Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness. [paper] [code]
- Tianlong Chen, Huan Zhang, Zhenyu Zhang, Shiyu Chang, Sijia Liu, Pin-Yu Chen, Zhangyang Wang. ICML 2022
- Key Word: Certified Adversarial Robustness; Pruning.
- Digest
  Certifiable robustness is a highly desirable property for adopting deep neural networks (DNNs) in safety-critical scenarios, but often demands tedious computations to establish. The main hurdle lies in the massive amount of non-linearity in large DNNs. To trade off the DNN expressiveness (which calls for more non-linearity) and robustness certification scalability (which prefers more linearity), we propose a novel solution to strategically manipulate neurons, by "grafting" appropriate levels of linearity. The core of our proposal is to first linearize insignificant ReLU neurons, to eliminate the non-linear components that are both redundant for DNN performance and harmful to its certification. We then optimize the associated slopes and intercepts of the replaced linear activations for restoring model performance while maintaining certifiability. Hence, typical neuron pruning could be viewed as a special case of grafting a linear function of the fixed zero slopes and intercept, that might overly restrict the network flexibility and sacrifice its performance.
Adversarial Vulnerability of Randomized Ensembles. [paper] [code]
- Hassan Dbouk, Naresh R. Shanbhag. ICML 2022
- Key Word: Adaptive Adversarial Attacks; Ensemble Adversarial Training; Randomized Smoothing.
- Digest
  Recently, works on randomized ensembles have empirically demonstrated significant improvements in adversarial robustness over standard adversarially trained (AT) models with minimal computational overhead, making them a promising solution for safety-critical resource-constrained applications. However, this impressive performance raises the question: Are these robustness gains provided by randomized ensembles real? In this work we address this question both theoretically and empirically. We first establish theoretically that commonly employed robustness evaluation methods such as adaptive PGD provide a false sense of security in this setting.
Meet You Halfway: Explaining Deep Learning Mysteries. [paper]
- Oriel BenShmuel.
- Key Word: Adversarial Robustness.
- Digest
  We introduce a new conceptual framework attached with a formal description that aims to shed light on the network's behavior and interpret the behind-the-scenes of the learning process. Our framework provides an explanation for inherent questions concerning deep learning. Particularly, we clarify: (1) Why do neural networks acquire generalization abilities? (2) Why do adversarial examples transfer between different models?. We provide a comprehensive set of experiments that support this new framework, as well as its underlying theory.
Early Transferability of Adversarial Examples in Deep Neural Networks. [paper]
- Oriel BenShmuel.
- Key Word: Adversarial Transferability.
- Digest
  This paper will describe and analyze a new phenomenon that was not known before, which we call "Early Transferability". Its essence is that the adversarial perturbations transfer among different networks even at extremely early stages in their training. In fact, one can initialize two networks with two different independent choices of random weights and measure the angle between their adversarial perturbations after each step of the training. What we discovered was that these two adversarial directions started to align with each other already after the first few training steps (which typically use only a small fraction of the available training data), even though the accuracy of the two networks hadn't started to improve from their initial bad values due to the early stage of the training.
Gradient Obfuscation Gives a False Sense of Security in Federated Learning. [paper]
- Kai Yue, Richeng Jin, Chau-Wai Wong, Dror Baron, Huaiyu Dai.
- Key Word: Federated Learning; Adversarial Robustness; Privacy.
- Digest
  We present a new data reconstruction attack framework targeting the image classification task in federated learning. We show that commonly adopted gradient postprocessing procedures, such as gradient quantization, gradient sparsification, and gradient perturbation, may give a false sense of security in federated learning. Contrary to prior studies, we argue that privacy enhancement should not be treated as a byproduct of gradient compression.
Building Robust Ensembles via Margin Boosting. [paper] [code]
- Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, Arun Sai Suggala. ICML 2022
- Key Word: Adversarial Robustness; Boosting.
- Digest
  In the context of adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks, and as a result, has sub-optimal robustness. Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks. In this work, we take a principled approach towards building robust ensembles. We view this problem from the perspective of margin-boosting and develop an algorithm for learning an ensemble with maximum margin.
Adversarial Unlearning: Reducing Confidence Along Adversarial Directions. [paper] [code]
- Amrith Setlur, Benjamin Eysenbach, Virginia Smith, Sergey Levine. **
- Key Word: Adversarial Training; Entropy Maximization.
- Digest
  We propose a complementary regularization strategy that reduces confidence on self-generated examples. The method, which we call RCAD (Reducing Confidence along Adversarial Directions), aims to reduce confidence on out-of-distribution examples lying along directions adversarially chosen to increase training loss. In contrast to adversarial training, RCAD does not try to robustify the model to output the original label, but rather regularizes it to have reduced confidence on points generated using much larger perturbations than in conventional adversarial training.
An Analytic Framework for Robust Training of Artificial Neural Networks. [paper]
- Ramin Barati, Reza Safabakhsh, Mohammad Rahmati.
- Key Word: Adversarial Robustness; Geometric and Analytic Modeling.
- Digest
  Many studies investigate the phenomenon by proposing a simplified model of how adversarial examples occur and validate it by predicting some aspect of the phenomenon. While these studies cover many different characteristics of the adversarial examples, they have not reached a holistic approach to the geometric and analytic modeling of the phenomenon. This paper propose a formal framework to study the phenomenon in learning theory and make use of complex analysis and holomorphicity to offer a robust learning rule for artificial neural networks.
Diffusion Models for Adversarial Purification. [paper] [code]
- Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar. ICML 2022
- Key Word: Adversarial Purification; Diffusion Models.
- Digest
  We propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. To evaluate our method against strong adaptive attacks in an efficient and scalable way, we propose to use the adjoint method to compute full gradients of the reverse generative process.
Self-Ensemble Adversarial Training for Improved Robustness. [paper] [code]
- Hongjun Wang, Yisen Wang. ICLR 2022
- Key Word: Adversarial Robustness.
- Digest
  We are dedicated to the weight states of models through the training process and devise a simple but powerful Self-Ensemble Adversarial Training (SEAT) method for yielding a robust classifier by averaging weights of history models. This considerably improves the robustness of the target model against several well known adversarial attacks, even merely utilizing the naive cross-entropy loss to supervise.
A Unified Wasserstein Distributional Robustness Framework for Adversarial Training. [paper] [code]
- Tuan Anh Bui, Trung Le, Quan Tran, He Zhao, Dinh Phung. ICLR 2022
- Key Word: Adversarial Robustness; Distribution Shift.
- Digest
  This paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods. We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework.
Make Some Noise: Reliable and Efficient Single-Step Adversarial Training. [paper] [code]
- Pau de Jorge, Adel Bibi, Riccardo Volpi, Amartya Sanyal, Philip H. S. Torr, Grégory Rogez, Puneet K. Dokania.
- Key Word: Adversarial Training; Robust Overfitting.
- Digest
  We methodically revisit the role of noise and clipping in single-step adversarial training. Contrary to previous intuitions, we find that using a stronger noise around the clean sample combined with not clipping is highly effective in avoiding CO for large perturbation radii. Based on these observations, we then propose Noise-FGSM (N-FGSM) that, while providing the benefits of single-step adversarial training, does not suffer from CO. Empirical analyses on a large suite of experiments show that N-FGSM is able to match or surpass the performance of previous single-step methods while achieving a 3× speed-up.

Poisoning Attacks and Defenses

FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning. [paper]
- Kaiyuan Zhang, Guanhong Tao, Qiuling Xu, Siyuan Cheng, Shengwei An, Yingqi Liu, Shiwei Feng, Guangyu Shen, Pin-Yu Chen, Shiqing Ma, Xiangyu Zhang.
- Key Word: Backdoor Defenses; Federated Learning.
- Digest
  We theoretically analyze the connection among cross-entropy loss, attack success rate, and clean accuracy in this setting. Moreover, we propose a trigger reverse engineering based defense and show that our method can achieve robustness improvement with guarantee (i.e., reducing the attack success rate) without affecting benign accuracy.
FedRecover: Recovering from Poisoning Attacks in Federated Learning using Historical Information. [paper]
- Xiaoyu Cao, Jinyuan Jia, Zaixi Zhang, Neil Zhenqiang Gong. S&P 2023
- Key Word: Federated Learning; Poisoning Defenses.
- Digest
  We propose FedRecover, which can recover an accurate global model from poisoning attacks with small cost for the clients. Our key idea is that the server estimates the clients' model updates instead of asking the clients to compute and communicate them during the recovery process. In particular, the server stores the global models and clients' model updates in each round, when training the poisoned global model. During the recovery process, the server estimates a client's model update in each round using its stored historical information.
Not All Poisons are Created Equal: Robust Training against Data Poisoning. [paper] [code]
- Yu Yang, Tian Yu Liu, Baharan Mirzasoleiman. ICML 2022
- Key Word: Poisoning Defenses.
- Digest
  We propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks, and provides theoretical guarantees for the performance of the model. Targeted attacks work by adding bounded perturbations to a randomly selected subset of training data to match the targets' gradient or representation. We show that: (i) under bounded perturbations, only a number of poisons can be optimized to have a gradient that is close enough to that of the target and make the attack successful; (ii) such effective poisons move away from their original class and get isolated in the gradient space; (iii) dropping examples in low-density gradient regions during training can successfully eliminate the effective poisons, and guarantees similar training dynamics to that of training on full data.
Towards Fair Classification against Poisoning Attacks. [paper]
- Han Xu, Xiaorui Liu, Yuxuan Wan, Jiliang Tang.
- Key Word: Poisoning Attacks; Fairness.
- Digest
  We study the poisoning scenario where the attacker can insert a small fraction of samples into training data, with arbitrary sensitive attributes as well as other predictive features. We demonstrate that the fairly trained classifiers can be greatly vulnerable to such poisoning attacks, with much worse accuracy & fairness trade-off, even when we apply some of the most effective defenses (originally proposed to defend traditional classification tasks).
Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning. [paper] [code]
- Yuxin Wen, Jonas Geiping, Liam Fowl, Hossein Souri, Rama Chellappa, Micah Goldblum, Tom Goldstein.
- Key Word: Backdoor Attacks; Federated Learning.
- Digest
  Existing attacks do not account for future behaviors of other users, and thus require many sequential updates and their effects are quickly erased. We propose an attack that anticipates and accounts for the entire federated learning pipeline, including behaviors of other clients, and ensures that backdoors are effective quickly and persist even after multiple rounds of community updates.
Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class. [paper]
- Khoa D. Doan, Yingjie Lao, Ping Li. NeurIPS 2022
- Key Word: Backdoor Attacks.
- Digest
  This paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman, where the adversary can arbitrarily choose which target class the model will misclassify given any input during inference. To achieve this goal, we propose to represent the trigger function as a class-conditional generative model and to inject the backdoor in a constrained optimization framework, where the trigger function learns to generate an optimal trigger pattern to attack any target class at will while simultaneously embedding this generative backdoor into the trained model.
Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork. [paper]
- Haotao Wang, Junyuan Hong, Aston Zhang, Jiayu Zhou, Zhangyang Wang. NeurIPS 2022
- Key Word: Backdoor Defenses.
- Digest
  We propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples from the model. Our defense strategy, Trap and Replace, consists of two stages. In the first stage, we bait and trap the backdoors in a small and easy-to-replace subnetwork. In the second stage, we replace the poisoned light-weighted classification head with an untainted one, by re-training it from scratch only on a small holdout dataset with clean samples, while fixing the stem network.
Few-shot Backdoor Attacks via Neural Tangent Kernels. [paper]
- Jonathan Hayase, Sewoong Oh.
- Key Word: Backdoor Attacks; Neural Tangent Kernel.
- Digest
  In a backdoor attack, an attacker injects corrupted examples into the training set. The goal of the attacker is to cause the final trained model to predict the attacker's desired target label when a predefined trigger is added to test inputs. Central to these attacks is the trade-off between the success rate of the attack and the number of corrupted training examples injected. We pose this attack as a novel bilevel optimization problem: construct strong poison examples that maximize the attack success rate of the trained model. We use neural tangent kernels to approximate the training dynamics of the model being attacked and automatically learn strong poison examples.
Backdoor Attacks in the Supply Chain of Masked Image Modeling. [paper]
- Xinyue Shen, Xinlei He, Zheng Li, Yun Shen, Michael Backes, Yang Zhang.
- Key Word: Backdoor Attacks; Masked Image Modeling.
- Digest
  We are the first to systematically threat modeling on SSL in every phase of the model supply chain, i.e., pre-training, release, and downstream phases. Our evaluation shows that models built with MIM are vulnerable to existing backdoor attacks in release and downstream phases and are compromised by our proposed method in pre-training phase. For instance, on CIFAR10, the attack success rate can reach 99.62%, 96.48%, and 98.89% in the downstream phase, release phase, and pre-training phase, respectively.
FLCert: Provably Secure Federated Learning against Poisoning Attacks. [paper]
- Xiaoyu Cao, Zaixi Zhang, Jinyuan Jia, Neil Zhenqiang Gong. TIFS
- Key Word: Poisoning Defenses; Federated Learning.
- Digest
  We aim to bridge the gap by proposing FLCert, an ensemble federated learning framework, that is provably secure against poisoning attacks with a bounded number of malicious clients. Our key idea is to divide the clients into groups, learn a global model for each group of clients using any existing federated learning method, and take a majority vote among the global models to classify a test input.
Data Poisoning Attacks Against Multimodal Encoders. [paper]
- Ziqing Yang, Xinlei He, Zheng Li, Michael Backes, Mathias Humbert, Pascal Berrang, Yang Zhang.
- Key Word: Data Poisoning; Multimodal Learning.
- Digest
  We instead focus on answering two questions: (1) Is the linguistic modality also vulnerable to poisoning attacks? and (2) Which modality is most vulnerable? To answer the two questions, we conduct three types of poisoning attacks against CLIP, the most representative multimodal contrastive learning framework.
Augmentation Backdoors. [paper] [code]
- Joseph Rance, Yiren Zhao, Ilia Shumailov, Robert Mullins.
- Key Word: Data Augmentation; Backdoor Attacks.
- Digest
  We present three backdoor attacks that can be covertly inserted into data augmentation. Our attacks each insert a backdoor using a different type of computer vision augmentation transform, covering simple image transforms, GAN-based augmentation, and composition-based augmentation. By inserting the backdoor using these augmentation transforms, we make our backdoors difficult to detect, while still supporting arbitrary backdoor functionality.
RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN. [paper] [code]
- Huy Phan, Cong Shi, Yi Xie, Tianfang Zhang, Zhuohang Li, Tianming Zhao, Jian Liu, Yan Wang, Yingying Chen, Bo Yuan. ECCV 2022
- Key Word: Backdoor Attack; Model Compression.
- Digest
  We propose to study and develop Robust and Imperceptible Backdoor Attack against Compact DNN models (RIBAC). By performing systematic analysis and exploration on the important design knobs, we propose a framework that can learn the proper trigger patterns, model parameters and pruning masks in an efficient way. Thereby achieving high trigger stealthiness, high attack success rate and high model efficiency simultaneously.
Lethal Dose Conjecture on Data Poisoning. [paper]
- Wenxiao Wang, Alexander Levine, Soheil Feizi.
- Key Word: Data poisoning; Deep Partition Aggregation; Finite Aggregation.
- Digest
  Deep Partition Aggregation (DPA) and its extension, Finite Aggregation (FA) are recent approaches for provable defenses against data poisoning, where they predict through the majority vote of many base models trained from different subsets of training set using a given learner. The conjecture implies that both DPA and FA are (asymptotically) optimal -- if we have the most data-efficient learner, they can turn it into one of the most robust defenses against data poisoning. This outlines a practical approach to developing stronger defenses against poisoning via finding data-efficient learners.
Data-free Backdoor Removal based on Channel Lipschitzness. [paper] [code]
- Runkai Zheng, Rongjun Tang, Jianze Li, Li Liu. ECCV 2022
- Key Word: Backdoor Defense; Lipschitz Constant; Model pruning.
- Digest
  We introduce a novel concept called Channel Lipschitz Constant (CLC), which is defined as the Lipschitz constant of the mapping from the input images to the output of each channel. Then we provide empirical evidences to show the strong correlation between an Upper bound of the CLC (UCLC) and the trigger-activated change on the channel activation. Since UCLC can be directly calculated from the weight matrices, we can detect the potential backdoor channels in a data-free manner, and do simple pruning on the infected DNN to repair the model.
Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation. [paper]
- Tong Wu, Tianhao Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal.
- Key Word: Backdoor Attacks; Object Detection.
- Digest
  Our method constructs the poisoned dataset by rotating a limited amount of objects and labeling them incorrectly; once trained with it, the victim's model will make undesirable predictions during run-time inference. It exhibits a significantly high attack success rate while maintaining clean performance through comprehensive empirical studies on image classification and object detection tasks.
Suppressing Poisoning Attacks on Federated Learning for Medical Imaging. [paper] [code]
- Naif Alkhunaizi, Dmitry Kamzolov, Martin Takáč, Karthik Nandakumar.
- Key Word: Poisoning Attacks; Federated Learning; Medical Imaging; Healthcare.
- Digest
  We propose a robust aggregation rule called Distance-based Outlier Suppression (DOS) that is resilient to byzantine failures. The proposed method computes the distance between local parameter updates of different clients and obtains an outlier score for each client using Copula-based Outlier Detection (COPOD). The resulting outlier scores are converted into normalized weights using a softmax function, and a weighted average of the local parameters is used for updating the global model.
When does Bias Transfer in Transfer Learning? [paper] [code]
- Hadi Salman, Saachi Jain, Andrew Ilyas, Logan Engstrom, Eric Wong, Aleksander Madry.
- Key Word: Backdoor Attacks; Bias Transfer.
- Digest
  Using transfer learning to adapt a pre-trained "source model" to a downstream "target task" can dramatically increase performance with seemingly no downside. In this work, we demonstrate that there can exist a downside after all: bias transfer, or the tendency for biases of the source model to persist even after adapting the model to the target class. Through a combination of synthetic and natural experiments, we show that bias transfer both (a) arises in realistic settings (such as when pre-training on ImageNet or other standard datasets) and (b) can occur even when the target dataset is explicitly de-biased.
Backdoor Attack is A Devil in Federated GAN-based Medical Image Synthesis. [paper]
- Ruinan Jin, Xiaoxiao Li.
- Key Word: Backdoor Attacks; Federated Learning; Generative Adversarial Nets; Medical Image; Healthcare.
- Digest
  We propose a way of attacking federated GAN (FedGAN) by treating the discriminator with a commonly used data poisoning strategy in backdoor attack classification models. We demonstrate that adding a small trigger with size less than 0.5 percent of the original image size can corrupt the FL-GAN model. Based on the proposed attack, we provide two effective defense strategies: global malicious detection and local training regularization.
BackdoorBench: A Comprehensive Benchmark of Backdoor Learning. [paper] [code]
- Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Chao Shen, Hongyuan Zha.
- Key Word: Backdoor Learning; Benchmark.
- Digest
  We find that the evaluations of new methods are often unthorough to verify their claims and real performance, mainly due to the rapid development, diverse settings, as well as the difficulties of implementation and reproducibility. Without thorough evaluations and comparisons, it is difficult to track the current progress and design the future development roadmap of the literature. To alleviate this dilemma, we build a comprehensive benchmark of backdoor learning, called BackdoorBench. It consists of an extensible modular based codebase (currently including implementations of 8 state-of-the-art (SOTA) attack and 9 SOTA defense algorithms), as well as a standardized protocol of a complete backdoor learning. We also provide comprehensive evaluations of every pair of 8 attacks against 9 defenses, with 5 poisoning ratios, based on 5 models and 4 datasets, thus 8,000 pairs of evaluations in total.
zPROBE: Zero Peek Robustness Checks for Federated Learning. [paper]
- Zahra Ghodsi, Mojan Javaheripi, Nojan Sheybani, Xinqiao Zhang, Ke Huang, Farinaz Koushanfar.
- Key Word: Byzantine Attacks; Federated Learning; Zero-Knowledge Proof.
- Digest
  We establish the first private robustness check that uses high break point rank-based statistics on aggregated model updates. By exploiting randomized clustering, we significantly improve the scalability of our defense without compromising privacy. We leverage the derived statistical bounds in zero-knowledge proofs to detect and remove malicious updates without revealing the private user updates. Our novel framework, zPROBE, enables Byzantine resilient and secure federated learning.
Natural Backdoor Datasets. [paper] [code]
- Emily Wenger, Roma Bhattacharjee, Arjun Nitin Bhagoji, Josephine Passananti, Emilio Andere, Haitao Zheng, Ben Y. Zhao.
- Key Word: Natural Backdoor Attacks.
- Digest
  Extensive literature on backdoor poison attacks has studied attacks and defenses for backdoors using "digital trigger patterns." In contrast, "physical backdoors" use physical objects as triggers, have only recently been identified, and are qualitatively different enough to resist all defenses targeting digital trigger backdoors. Research on physical backdoors is limited by access to large datasets containing real images of physical objects co-located with targets of classification. Building these datasets is time- and labor-intensive. This works seeks to address the challenge of accessibility for research on physical backdoor attacks. We hypothesize that there may be naturally occurring physically co-located objects already present in popular datasets such as ImageNet. Once identified, a careful relabeling of these data can transform them into training samples for physical backdoor attacks. We propose a method to scalably identify these subsets of potential triggers in existing datasets, along with the specific classes they can poison.
Neurotoxin: Durable Backdoors in Federated Learning. [paper]
- Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael W. Mahoney, Joseph E. Gonzalez, Kannan Ramchandran, Prateek Mittal. ICML 2022
- Key Word: Backdoor Attacks; Federated Learning.
- Digest
  Prior work has shown that backdoors can be inserted into FL models, but these backdoors are often not durable, i.e., they do not remain in the model after the attacker stops uploading poisoned updates. Thus, since training typically continues progressively in production FL systems, an inserted backdoor may not survive until deployment. Here, we propose Neurotoxin, a simple one-line modification to existing backdoor attacks that acts by attacking parameters that are changed less in magnitude during training.
Backdoor Attacks on Vision Transformers. [paper] [code]
- Akshayvarun Subramanya, Aniruddha Saha, Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash.
- Key Word: Backdoor Attacks; Vision Transformers.
- Digest
  We are the first to show that ViTs are vulnerable to backdoor attacks. We also find an intriguing difference between ViTs and CNNs - interpretation algorithms effectively highlight the trigger on test images for ViTs but not for CNNs. Based on this observation, we propose a test-time image blocking defense for ViTs which reduces the attack success rate by a large margin.
Autoregressive Perturbations for Data Poisoning. [paper] [code]
- Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein, David W. Jacobs.
- Key Word: Poisoning Attacks.
- Digest
  We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset. The proposed AR perturbations are generic, can be applied across different datasets, and can poison different architectures. Compared to existing unlearnable methods, our AR poisons are more resistant against common defenses such as adversarial training and strong data augmentations. Our analysis further provides insight into what makes an effective data poison.
Backdoor Defense via Decoupling the Training Process. [paper] [code]
- Kunzhe Huang, Yiming Li, Baoyuan Wu, Zhan Qin, Kui Ren. ICLR 2022
- Key Word: Backdoor Defenses.
- Digest
  We reveal that poisoned samples tend to cluster together in the feature space of the attacked DNN model, which is mostly due to the end-to-end supervised training paradigm. Inspired by this observation, we propose a novel backdoor defense via decoupling the original end-to-end training process into three stages. Specifically, we first learn the backbone of a DNN model via self-supervised learning based on training samples without their labels.
Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios. [paper] [code]
- Zhen Xiang, David J. Miller, George Kesidis. ICLR 2022
- Key Word: Backdoor Detection; Adversarial Training.
- Digest
  We propose a detection framework based on BP reverse-engineering and a novel expected transferability (ET) statistic. We show that our ET statistic is effective using the same detection threshold, irrespective of the classification domain, the attack configuration, and the BP reverse-engineering algorithm that is used.

Privacy

Personalized Federated Learning with Adaptive Batchnorm for Healthcare. [paper]
- Wang Lu, Jindong Wang, Yiqiang Chen, Xin Qin, Renjun Xu, Dimitrios Dimitriadis, and Tao Qin. IEEE TBD 2022
- Key Word: Federated learning; Batch normalization; Personalized FL
- Digest
  We propose FedAP to tackle domain shifts and then obtain personalized models for local clients. FedAP learns the similarity between clients based on the statistics of the batch normalization layers while preserving the specificity of each client with different local batch normalization.
MetaFed: Federated Learning among Federations with Cyclic Knowledge Distillation for Personalized Healthcare. [paper]
- Yiqiang Chen, Wang Lu, Xin Qin, Jindong Wang, and Xing Xie. IJCAI'FL 2022
- Key Word: Federated learning; Personalized FL
- Digest
  MetaFed is to facilitate trustworthy FL between different federations. MetaFed obtains a personalized model for each federation without a central server via the proposed Cyclic Knowledge Distillation. Specifically, MetaFed treats each federation as a meta distribution and aggregates knowledge of each federation in a cyclic manner.
Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean Estimation. [paper]
- Kristian Georgiev, Samuel B. Hopkins.
- Key Word: Differential Privacy; Automatic Robustness.
- Digest
  We establish a simple connection between robust and differentially-private algorithms: private mechanisms which perform well with very high probability are automatically robust in the sense that they retain accuracy even if a constant fraction of the samples they receive are adversarially corrupted. Since optimal mechanisms typically achieve these high success probabilities, our results imply that optimal private mechanisms for many basic statistics problems are robust.
Machine Unlearning of Federated Clusters. [paper]
- Chao Pan, Jin Sima, Saurav Prakash, Vishal Rana, Olgica Milenkovic.
- Key Word: Machine Unlearning; Federated Learning.
- Digest
  This work proposes the first known unlearning mechanism for federated clustering with privacy criteria that support simple, provable, and efficient data removal at the client and server level. The gist of our approach is to combine special initialization procedures with quantization methods that allow for secure aggregation of estimated local cluster counts at the server unit. As part of our platform, we introduce secure compressed multiset aggregation (SCMA), which is of independent interest for secure sparse model aggregation.
Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano. [paper]
- Chuan Guo, Alexandre Sablayrolles, Maziar Sanjabi.
- Key Word: Differential Privacy; Reconstruction Attacks; Fano’s Inequality.
- Digest
  We study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing. We utilize different variants of the celebrated Fano's inequality to derive upper bounds on the inferential power of a data reconstruction adversary when the model is trained differentially privately. Importantly, we show that if the underlying private data takes values from a set of size M, then the target privacy parameter ϵ can be O(logM) before the adversary gains significant inferential power.
On the Robustness of Dataset Inference. [paper]
- Sebastian Szyller, Rui Zhang, Jian Liu, N. Asokan.
- Key Word: Dataset Inference; Data Ownership Verification.
- Digest
  A fingerprinting technique introduced at ICLR '21, Dataset Inference (DI), has been shown to offer better robustness and efficiency than prior methods. The authors of DI provided a correctness proof for linear (suspect) models. However, in the same setting, we prove that DI suffers from high false positives (FPs) -- it can incorrectly identify an independent model trained with non-overlapping data from the same distribution as stolen. We further prove that DI also triggers FPs in realistic, non-linear suspect models.
A General Framework for Auditing Differentially Private Machine Learning. [paper]
- Fred Lu, Joseph Munoz, Maya Fuchs, Tyler LeBlond, Elliott Zaresky-Williams, Edward Raff, Francis Ferraro, Brian Testa. NeurIPS 2022
- Key Word: Auditing Differential Privacy.
- Digest
  We present a framework to statistically audit the privacy guarantee conferred by a differentially private machine learner in practice. While previous works have taken steps toward evaluating privacy loss through poisoning attacks or membership inference, they have been tailored to specific models or have demonstrated low statistical power. Our work develops a general methodology to empirically evaluate the privacy of differentially private machine learning implementations, combining improved privacy search and verification methods with a toolkit of influence-based poisoning attacks.
Differentially Private Deep Learning with ModelMix. [paper]
- Hanshen Xiao, Jun Wan, Srinivas Devadas.
- Key Word: Differential Privacy; Clipped Stochastic Gradient Descent.
- Digest
  We provide rigorous analyses for both the utility guarantees and privacy amplification of ModelMix. In particular, we present a formal study on the effect of gradient clipping in DP-SGD, which provides theoretical instruction on how hyper-parameters should be selected. We also introduce a refined gradient clipping method, which can further sharpen the privacy loss in private learning when combined with ModelMix.
Fine-Tuning with Differential Privacy Necessitates an Additional Hyperparameter Search. [paper]
- Yannis Cattan, Christopher A. Choquette-Choo, Nicolas Papernot, Abhradeep Thakurta.
- Key Word: Fine-Tuning; Differential Privacy.
- Digest
  In this work, we identify an oversight of existing approaches for differentially private fine tuning. They do not tailor the fine-tuning approach to the specifics of learning with privacy. Our main result is to show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy.
Data Leakage in Tabular Federated Learning. [paper]
- Mark Vero, Mislav Balunović, Dimitar I. Dimitrov, Martin Vechev.
- Key Word: Federated Learning; Data Leakage Attacks.
- Digest
  Compared to the NLP and image domains, reconstruction of tabular data poses several unique challenges: (i) categorical features introduce a significantly more difficult mixed discrete-continuous optimization problem, (ii) the mix of categorical and continuous features causes high variance in the final reconstructions, and (iii) structured data makes it difficult for the adversary to judge reconstruction quality. In this work, we tackle these challenges and propose the first comprehensive reconstruction attack on tabular data, called TabLeak. TabLeak is based on three key ingredients: (i) a softmax structural prior, implicitly converting the mixed discrete-continuous optimization problem into an easier fully continuous one, (ii) a way to reduce the variance of our reconstructions through a pooled ensembling scheme exploiting the structure of tabular data, and (iii) an entropy measure which can successfully assess reconstruction quality.
Certified Data Removal in Sum-Product Networks. [paper]
- Alexander Becker, Thomas Liebig. ICKG 2022
- Key Word: Machine Unlearning; Sum-Product Networks.
- Digest
  Data protection regulations like the GDPR or the California Consumer Privacy Act give users more control over the data that is collected about them. Deleting the collected data is often insufficient to guarantee data privacy since it is often used to train machine learning models, which can expose information about the training data. Thus, a guarantee that a trained model does not expose information about its training data is additionally needed. In this paper, we present UnlearnSPN -- an algorithm that removes the influence of single data points from a trained sum-product network and thereby allows fulfilling data privacy requirements on demand.
Membership Inference Attacks Against Text-to-image Generation Models. [paper]
- Yixin Wu, Ning Yu, Zheng Li, Michael Backes, Yang Zhang.
- Key Word: Membership Inference Attacks; Text-to-Image Generation.
- Digest
  We perform the first privacy analysis of text-to-image generation models through the lens of membership inference. Specifically, we propose three key intuitions about membership information and design four attack methodologies accordingly. We conduct comprehensive evaluations on two mainstream text-to-image generation models including sequence-to-sequence modeling and diffusion-based modeling.
UnGANable: Defending Against GAN-based Face Manipulation. [paper]
- Zheng Li, Ning Yu, Ahmed Salem, Michael Backes, Mario Fritz, Yang Zhang. USENIX Security 2023
- Key Word: GAN Inversion Attacks; Latent Code Manipulation.
- Digest
  We propose the first defense system, namely UnGANable, against GAN-inversion-based face manipulation. In specific, UnGANable focuses on defending GAN inversion, an essential step for face manipulation. Its core technique is to search for alternative images (called cloaked images) around the original images (called target images) in image space.
Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection. [paper] [code]
- Yiming Li, Yang Bai, Yong Jiang, Yong Yang, Shu-Tao Xia, Bo Li. NeurIPS 2022
- Key Word: Data Watermarking; Defenses against Data Leakage.
- Digest
  We explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. Specifically, we introduce two dispersibilities and prove their correlation, based on which we design the untargeted backdoor watermark under both poisoned-label and clean-label settings. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification.
No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy". [paper]
- Nicholas Carlini, Vitaly Feldman, Milad Nasr.
- Key Word: Differential Privacy; Dataset Condensation.
- Digest
  New methods designed to preserve data privacy require careful scrutiny. Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a ``privacy-preserving'' method is attacked. A recent work selected for an Outstanding Paper Award at ICML 2022 (Dong et al., 2022) claims that dataset condensation (DC) significantly improves data privacy when training machine learning models. This claim is supported by theoretical analysis of a specific dataset condensation technique and an empirical evaluation of resistance to some existing membership inference attacks.
Algorithms that Approximate Data Removal: New Results and Limitations. [paper]
- Vinith M. Suriyakumar, Ashia C. Wilson. NeurIPS 2022
- Key Word: Data Removal; Machine Unlearning.
- Digest
  We study the problem of deleting user data from machine learning models trained using empirical risk minimization. Our focus is on learning algorithms which return the empirical risk minimizer and approximate unlearning algorithms that comply with deletion requests that come streaming minibatches. Leveraging the infintesimal jacknife, we develop an online unlearning algorithm that is both computationally and memory efficient. Unlike prior memory efficient unlearning algorithms, we target models that minimize objectives with non-smooth regularizers, such as the commonly used ℓ1, elastic net, or nuclear norm penalties.
Deep Learning-based Anonymization of Chest Radiographs: A Utility-preserving Measure for Patient Privacy. [paper]
- Kai Packhäuser, Sebastian Gündel, Florian Thamm, Felix Denzinger, Andreas Maier.
- Key Word: Image Anonymization; Patient Privacy; Data Utility; Chest Radiographs.
- Digest
  We propose the first deep learning-based approach to targetedly anonymize chest radiographs while maintaining data utility for diagnostic and machine learning purposes. Our model architecture is a composition of three independent neural networks that, when collectively used, allow for learning a deformation field that is able to impede patient re-identification. The individual influence of each component is investigated with an ablation study. Quantitative results on the ChestX-ray14 dataset show a reduction of patient re-identification from 81.8% to 58.6% in the area under the receiver operating characteristic curve (AUC) with little impact on the abnormality classification performance.
In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning. [paper]
- Jiaqi Wang, Roei Schuster, Ilia Shumailov, David Lie, Nicolas Papernot. NeurIPS 2022
- Key Word: Differential Privacy.
- Digest
  We observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age.
Dataset Inference for Self-Supervised Models. [paper]
- Adam Dziedzic, Haonan Duan, Muhammad Ahmad Kaleem, Nikita Dhawan, Jonas Guan, Yannis Cattan, Franziska Boenisch, Nicolas Papernot. NeurIPS 2022
- Key Word: Dataset Inferece; Model Stealing Attacks.
- Digest
  We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing. The intuition is that the log-likelihood of an encoder's output representations is higher on the victim's training data than on test data if it is stolen from the victim, but not if it is independently trained. We compute this log-likelihood using density estimation models. As part of our evaluation, we also propose measuring the fidelity of stolen encoders and quantifying the effectiveness of the theft detection without involving downstream tasks; instead, we leverage mutual information and distance measurements.
Membership Inference Attacks and Generalization: A Causal Perspective. [paper]
- Teodora Baluta, Shiqi Shen, S. Hitarth, Shruti Tople, Prateek Saxena.
- Key Word: Membership Inference Attacks; Causal Reasoning.
- Digest
  We propose the first approach to explain MI attacks and their connection to generalization based on principled causal reasoning. We offer causal graphs that quantitatively explain the observed MI attack performance achieved for 6 attack variants. We refute several prior non-quantitative hypotheses that over-simplify or over-estimate the influence of underlying causes, thereby failing to capture the complex interplay between several factors.
Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning. [paper]
- Jiachen T. Wang, Saeed Mahloujifar, Shouda Wang, Ruoxi Jia, Prateek Mittal. NeurIPS 2022
- Key Word: Renyi Differential Privacy; Propose-Test-Release.
- Digest
  Propose-Test-Release (PTR) is a differential privacy framework that works with local sensitivity of functions, instead of their global sensitivity. This framework is typically used for releasing robust statistics such as median or trimmed mean in a differentially private manner. While PTR is a common framework introduced over a decade ago, using it in applications such as robust SGD where we need many adaptive robust queries is challenging. This is mainly due to the lack of Renyi Differential Privacy (RDP) analysis, an essential ingredient underlying the moments accountant approach for differentially private deep learning. In this work, we generalize the standard PTR and derive the first RDP bound for it when the target function has bounded global sensitivity.
CLIPping Privacy: Identity Inference Attacks on Multi-Modal Machine Learning Models. [paper]
- Dominik Hintersdorf, Lukas Struppek, Kristian Kersting.
- Key Word: Multi-Modal Machine Learning; Identity Inference Attacks.
- Digest
  Image-text models like CLIP have not yet been looked at in the context of privacy attacks. While membership inference attacks aim to tell whether a specific data point was used for training, we introduce a new type of privacy attack, named identity inference attack (IDIA), designed for multi-modal image-text models like CLIP. Using IDIAs, an attacker can reveal whether a particular person, was part of the training data by querying the model in a black-box fashion with different images of the same person.
M^4I: Multi-modal Models Membership Inference. [paper] [code]
- Pingyi Hu, Zihan Wang, Ruoxi Sun, Hu Wang, Minhui Xue. NeurIPS 2022
- Key Word: Multi-modal Machine Learning; Membership inference.
- Digest
  we propose Multi-modal Models Membership Inference (M^4I) with two attack methods to infer the membership status, named metric-based (MB) M^4I and feature-based (FB) M^4I, respectively. More specifically, MB M^4I adopts similarity metrics while attacking to infer target data membership. FB M^4I uses a pre-trained shadow multi-modal feature extractor to achieve the purpose of data inference attack by comparing the similarities from extracted input and output features.
Black-box Ownership Verification for Dataset Protection via Backdoor Watermarking. [paper] [code]
- Yiming Li, Mingyan Zhu, Xue Yang, Yong Jiang, Shu-Tao Xia.
- Key Word: Black-box Ownership Verification; Backdoor Watermarking.
- Digest
  We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Our method contains two main parts, including dataset watermarking and dataset verification. Specifically, we exploit poison-only backdoor attacks (e.g., BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
A Framework for Evaluating Privacy-Utility Trade-off in Vertical Federated Learning. [paper]
- Yan Kang, Jiahuan Luo, Yuanqin He, Xiaojin Zhang, Lixin Fan, Qiang Yang.
- Key Word: Vertical Federated Learning; Privacy Attacks.
- Digest
  We propose an evaluation framework that formulates the privacy-utility evaluation problem. We then use this framework as a guide to comprehensively evaluate a broad range of protection mechanisms against most of the state-of-the-art privacy attacks for three widely-deployed VFL algorithms. These evaluations may help FL practitioners select appropriate protection mechanisms given specific requirements.
A Survey of Machine Unlearning. [paper] [code]
- Thanh Tam Nguyen, Thanh Trung Huynh, Phi Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, Quoc Viet Hung Nguyen.
- Key Word: Machine Unlearning; Survey.
- Digest
  In this survey paper, we seek to provide a thorough investigation of machine unlearning in its definitions, scenarios, mechanisms, and applications. Specifically, as a categorical collection of state-of-the-art research, we hope to provide a broad reference for those seeking a primer on machine unlearning and its various formulations, design requirements, removal requests, algorithms, and uses in a variety of ML applications. Furthermore, we hope to outline key findings and trends in the paradigm as well as highlight new areas of research that have yet to see the application of machine unlearning, but could nonetheless benefit immensely.
Are Attribute Inference Attacks Just Imputation? [paper] [code]
- Bargav Jayaraman, David Evans. CCS 2022
- Kew Word: Attribute Inference Attacks; Data Imputation.
- Digest
  Our main conclusions are: (1) previous attribute inference methods do not reveal more about the training data from the model than can be inferred by an adversary without access to the trained model, but with the same knowledge of the underlying distribution as needed to train the attribute inference attack; (2) black-box attribute inference attacks rarely learn anything that cannot be learned without the model; but (3) white-box attacks, which we introduce and evaluate in the paper, can reliably identify some records with the sensitive value attribute that would not be predicted without having access to the model.
On the Privacy Risks of Cell-Based NAS Architectures. [paper] [code]
- Hai Huang, Zhikun Zhang, Yun Shen, Michael Backes, Qi Li, Yang Zhang. CCS 2022
- Key Word: Cell-based Neural Architecture Search; Membership Inference Attack.
- Digest
  We fill this gap and systematically measure the privacy risks of NAS architectures. Leveraging the insights from our measurement study, we further explore the cell patterns of cell-based NAS architectures and evaluate how the cell patterns affect the privacy risks of NAS-searched architectures. Through extensive experiments, we shed light on how to design robust NAS architectures against privacy attacks, and also offer a general methodology to understand the hidden correlation between the NAS-searched architectures and other privacy risks.
Group Property Inference Attacks Against Graph Neural Networks. [paper]
- Xiuling Wang, Wendy Hui Wang. CCS 2022
- Key Word: Property inference attack; Graph neural networks.
- Digest
  We perform the first systematic study of group property inference attacks (GPIA) against GNNs. First, we consider a taxonomy of threat models under both black-box and white-box settings with various types of adversary knowledge, and design six different attacks for these settings. We evaluate the effectiveness of these attacks through extensive experiments on three representative GNN models and three real-world graphs. Our results demonstrate the effectiveness of these attacks whose accuracy outperforms the baseline approaches.
An Introduction to Machine Unlearning. [paper]
- Salvatore Mercuri, Raad Khraishi, Ramin Okhrati, Devesh Batra, Conor Hamill, Taha Ghasempour, Andrew Nowlan.
- Key Word: Machine Unlearning; Exact Unlearning; Approximate Unlearning; Data Removal; Data Privacy.
- Digest
  Removing the influence of a specified subset of training data from a machine learning model may be required to address issues such as privacy, fairness, and data quality. Retraining the model from scratch on the remaining data after removal of the subset is an effective but often infeasible option, due to its computational expense. The past few years have therefore seen several novel approaches towards efficient removal, forming the field of "machine unlearning", however, many aspects of the literature published thus far are disparate and lack consensus. In this paper, we summarise and compare seven state-of-the-art machine unlearning algorithms, consolidate definitions of core concepts used in the field, reconcile different approaches for evaluating algorithms, and discuss issues related to applying machine unlearning in practice.
Trading Off Privacy, Utility and Efficiency in Federated Learning. [paper]
- Xiaojin Zhang, Yan Kang, Kai Chen, Lixin Fan, Qiang Yang.
- Key Word: Federated Learning; Privacy.
- Digest
  We propose a unified federated learning framework that reconciles horizontal and vertical federated learning. Based on this framework, we formulate and quantify the trade-offs between privacy leakage, utility loss, and efficiency reduction, which leads us to the No-Free-Lunch (NFL) theorem for the federated learning system.
Membership Inference Attacks by Exploiting Loss Trajectory. [paper]
- Yiyong Liu, Zhengyu Zhao, Michael Backes, Yang Zhang. CCS 2022
- Key Word: Membership Inference Attacks.
- Digest
  Existing attack methods have commonly exploited the output information (mostly, losses) solely from the given target model. As a result, in practical scenarios where both the member and non-member samples yield similarly small losses, these methods are naturally unable to differentiate between them. To address this limitation, in this paper, we propose a new attack method, called \system, which can exploit the membership information from the whole training process of the target model for improving the attack performance.
SNAP: Efficient Extraction of Private Properties with Poisoning. [paper]
- Harsh Chaudhari, John Abascal, Alina Oprea, Matthew Jagielski, Florian Tramèr, Jonathan Ullman.
- Key Word: Property Inference Attacks; Poisoning Attacks.
- Digest
  We consider the setting of property inference attacks in which the attacker can poison a subset of the training dataset and query the trained target model. Motivated by our theoretical analysis of model confidences under poisoning, we design an efficient property inference attack, SNAP, which obtains higher attack success and requires lower amounts of poisoning than the state-of-the-art poisoning-based property inference attack by Mahloujifar et al.
Auditing Membership Leakages of Multi-Exit Networks. [paper]
- Zheng Li, Yiyong Liu, Xinlei He, Ning Yu, Michael Backes, Yang Zhang. CCS 2022
- Key Word: Membership Inference Attacks; Multi-Existing.
- Digest
  In this paper, we perform the first privacy analysis of multi-exit networks through the lens of membership leakages. In particular, we first leverage the existing attack methodologies to quantify the multi-exit networks' vulnerability to membership leakages. Our experimental results show that multi-exit networks are less vulnerable to membership leakages and the exit (number and depth) attached to the backbone model is highly correlated with the attack performance.
On the Design of Privacy-Aware Cameras: a Study on Deep Neural Networks. [paper] [code]
- Marcela Carvalho, Oussama Ennaffi, Sylvain Chateau, Samy Ait Bachir.
- Key Word: Smart City; Privacy-Aware Camera.
- Digest
  In this paper, the effect of camera distortions is studied using Deep Learning techniques commonly used to extract sensitive data. To do so, we simulate out-of-focus images corresponding to a realistic conventional camera with fixed focal length, aperture, and focus, as well as grayscale images coming from a monochrome camera. We then prove, through an experimental study, that we can build a privacy-aware camera that cannot extract personal information such as license plate numbers.
Joint Privacy Enhancement and Quantization in Federated Learning. [paper] [code]
- Natalie Lang, Elad Sofer, Tomer Shaked, Nir Shlezinger.
- Key Word: Federated Learning; Quantization; Privacy.
- Digest
  We propose a method coined joint privacy enhancement and quantization (JoPEQ), which jointly implements lossy compression and privacy enhancement in FL settings. In particular, JoPEQ utilizes vector quantization based on random lattice, a universal compression technique whose byproduct distortion is statistically equivalent to additive noise. This distortion is leveraged to enhance privacy by augmenting the model updates with dedicated multivariate privacy preserving noise.
Split-U-Net: Preventing Data Leakage in Split Learning for Collaborative Multi-Modal Brain Tumor Segmentation. [paper]
- Holger R. Roth, Ali Hatamizadeh, Ziyue Xu, Can Zhao, Wenqi Li, Andriy Myronenko, Daguang Xu.
- Key Word: Split Learning; Vertical Federated Learning; Multi-Modal Brain Tumor Segmentation; Data Inversion.
- Digest
  We propose "Split-U-Net" and successfully apply SL for collaborative biomedical image segmentation. Nonetheless, SL requires the exchanging of intermediate activation maps and gradients to allow training models across different feature spaces, which might leak data and raise privacy concerns. Therefore, we also quantify the amount of data leakage in common SL scenarios for biomedical image segmentation and provide ways to counteract such leakage by applying appropriate defense strategies.
Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models. [paper]
- Xinlei He, Zheng Li, Weilin Xu, Cory Cornelius, Yang Zhang.
- Key Word: Membership Inference Attacks and Defenses; Benchmark.
- Digest
  We fill this gap by presenting a large-scale measurement of different membership inference attacks and defenses. We systematize membership inference through the study of nine attacks and six defenses and measure the performance of different attacks and defenses in the holistic evaluation. We then quantify the impact of the threat model on the results of these attacks. We find that some assumptions of the threat model, such as same-architecture and same-distribution between shadow and target models, are unnecessary. We are also the first to execute attacks on the real-world data collected from the Internet, instead of laboratory datasets.
SoK: Machine Learning with Confidential Computing. [paper]
- Fan Mo, Zahra Tarkhani, Hamed Haddadi.
- Key Word: Survey; Confidential Computing; Trusted Execution Environment; Intergrity.
- Digest
  We systematize the findings on confidential computing-assisted ML security and privacy techniques for providing i) confidentiality guarantees and ii) integrity assurances. We further identify key challenges and provide dedicated analyses of the limitations in existing Trusted Execution Environment (TEE) systems for ML use cases. We discuss prospective works, including grounded privacy definitions, partitioned ML executions, dedicated TEE designs for ML, TEE-aware ML, and ML full pipeline guarantee. These potential solutions can help achieve a much strong TEE-enabled ML for privacy guarantees without introducing computation and system costs.
Inferring Sensitive Attributes from Model Explanations. [paper] [code]
- Vasisht Duddu, Antoine Boutet. CIKM 2022
- Key Word: We focus on the specific privacy risk of attribute inference attack wherein an adversary infers sensitive attributes of an input (e.g., race and sex) given its model explanations. We design the first attribute inference attack against model explanations in two threat models where model builder either (a) includes the sensitive attributes in training data and input or (b) censors the sensitive attributes by not including them in the training data and input.
On the Privacy Effect of Data Enhancement via the Lens of Memorization. [paper]
- Xiao Li, Qiongxiu Li, Zhanhao Hu, Xiaolin Hu.
- Key Word: Membership Inference Attacks; Data Augmentation; Adversarial Training.
- Digest
  We propose to investigate privacy from a new perspective called memorization. Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks as members compared to samples with low privacy risks. To solve this problem, we deploy a recent attack that can capture the memorization degrees of individual samples for evaluation. Through extensive experiments, we unveil non-trivial findings about the connections between three important properties of machine learning models, including privacy, generalization gap, and adversarial robustness.
Private Domain Adaptation from a Public Source. [paper]
- Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh.
- Key Word: Domain Adaptation; Differential Privacy; Frank-Wolfe Algorithm; Mirror Descent Algorithm.
- Digest
  In regression problems with no privacy constraints on the source or target data, a discrepancy minimization algorithm based on several theoretical guarantees was shown to outperform a number of other adaptation algorithm baselines. Building on that approach, we design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data. The design and analysis of our private algorithms critically hinge upon several key properties we prove for a smooth approximation of the weighted discrepancy, such as its smoothness with respect to the ℓ1-norm and the sensitivity of its gradient.
Dropout is NOT All You Need to Prevent Gradient Leakage. [paper]
- Daniel Scheliga, Patrick Mäder, Marco Seeland.
- Key Word: Dropout; Gradient Inversion Attacks.
- Digest
  Recent observations suggest that dropout could mitigate gradient leakage and improve model utility if added to neural networks. Unfortunately, this phenomenon has not been systematically researched yet. In this work, we thoroughly analyze the effect of dropout on iterative gradient inversion attacks. We find that state of the art attacks are not able to reconstruct the client data due to the stochasticity induced by dropout during model training. Nonetheless, we argue that dropout does not offer reliable protection if the dropout induced stochasticity is adequately modeled during attack optimization. Consequently, we propose a novel Dropout Inversion Attack (DIA) that jointly optimizes for client data and dropout masks to approximate the stochastic client model.
On the Fundamental Limits of Formally (Dis)Proving Robustness in Proof-of-Learning. [paper]
- Congyu Fang, Hengrui Jia, Anvith Thudi, Mohammad Yaghini, Christopher A. Choquette-Choo, Natalie Dullerud, Varun Chandrasekaran, Nicolas Papernot.
- Key Word: Proof-of-Learning; Adversarial Examples.
- Digest
  Proof-of-learning (PoL) proposes a model owner use machine learning training checkpoints to establish a proof of having expended the necessary compute for training. The authors of PoL forego cryptographic approaches and trade rigorous security guarantees for scalability to deep learning by being applicable to stochastic gradient descent and adaptive variants. This lack of formal analysis leaves the possibility that an attacker may be able to spoof a proof for a model they did not train. We contribute a formal analysis of why the PoL protocol cannot be formally (dis)proven to be robust against spoofing adversaries. To do so, we disentangle the two roles of proof verification in PoL: (a) efficiently determining if a proof is a valid gradient descent trajectory, and (b) establishing precedence by making it more expensive to craft a proof after training completes (i.e., spoofing).
Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning. [paper] [code]
- Xinlei He, Hongbin Liu, Neil Zhenqiang Gong, Yang Zhang. ECCV 2022
- Key Word: Membership Inference Attacks; Semi-Supervised Learning.
- Digest
  We take a different angle by studying the training data privacy of SSL. Specifically, we propose the first data augmentation-based membership inference attacks against ML models trained by SSL. Given a data sample and the black-box access to a model, the goal of membership inference attack is to determine whether the data sample belongs to the training dataset of the model. Our evaluation shows that the proposed attack can consistently outperform existing membership inference attacks and achieves the best performance against the model trained by SSL.
Learnable Privacy-Preserving Anonymization for Pedestrian Images. [paper] [code]
- Junwu Zhang, Mang Ye, Yao Yang. MM 2022
- Key Word: Privacy Protection; Person Re-Identification.
- Digest
  This paper studies a novel privacy-preserving anonymization problem for pedestrian images, which preserves personal identity information (PII) for authorized models and prevents PII from being recognized by third parties. Conventional anonymization methods unavoidably cause semantic information loss, leading to limited data utility. Besides, existing learned anonymization techniques, while retaining various identity-irrelevant utilities, will change the pedestrian identity, and thus are unsuitable for training robust re-identification models. To explore the privacy-utility trade-off for pedestrian images, we propose a joint learning reversible anonymization framework, which can reversibly generate full-body anonymous images with little performance drop on person re-identification tasks.
Certified Neural Network Watermarks with Randomized Smoothing. [paper] [code]
- Arpit Bansal, Ping-yeh Chiang, Michael Curry, Rajiv Jain, Curtis Wigington, Varun Manjunatha, John P Dickerson, Tom Goldstein. ICML 2022
- Key Word: Watermarking Neural Networks; Certified Defenses; Randomized Smoothing.
- Digest
  Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose a certifiable watermarking method. Using the randomized smoothing technique proposed in Chiang et al., we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold.
RelaxLoss: Defending Membership Inference Attacks without Losing Utility. [paper] [code]
- Dingfan Chen, Ning Yu, Mario Fritz. ICLR 2022
- Key Word: Membership Inference Attacks and Defenses.
- Digest
  We propose a novel training framework based on a relaxed loss with a more achievable learning target, which leads to narrowed generalization gap and reduced privacy leakage. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
High-Dimensional Private Empirical Risk Minimization by Greedy Coordinate Descent. [paper]
- Paul Mangold, Aurélien Bellet, Joseph Salmon, Marc Tommasi.
- Key Word: Differentially Private Empirical Risk Minimization.
- Digest
  In this paper, we study differentially private empirical risk minimization (DP-ERM). It has been shown that the (worst-case) utility of DP-ERM reduces as the dimension increases. This is a major obstacle to privately learning large machine learning models. In high dimension, it is common for some model's parameters to carry more information than others. To exploit this, we propose a differentially private greedy coordinate descent (DP-GCD) algorithm. At each iteration, DP-GCD privately performs a coordinate-wise gradient step along the gradients' (approximately) greatest entry.
When Does Differentially Private Learning Not Suffer in High Dimensions? [paper]
- Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin Tat Lee, Abhradeep Guha Thakurta.
- Key Word: Differentially Private Learning; Large Language Models.
- Digest
  Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following research question: When does the performance of differentially private learning not degrade with increasing model size? We identify that the magnitudes of gradients projected onto subspaces is a key factor that determines performance. To precisely characterize this for private convex learning, we introduce a condition on the objective that we term restricted Lipschitz continuity and derive improved bounds for the excess empirical and population risks that are dimension-independent under additional conditions.
Measuring Forgetting of Memorized Training Examples. [paper]
- Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang.
- Key Word: Membership Inference Attacks; Reconstruction Attacks.
- Digest
  We connect these phenomena. We propose a technique to measure to what extent models ``forget'' the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convexity can prevent forgetting from happening in the worst-case, standard image and speech models empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets -- for instance those examples used to pre-train a model -- may observe privacy benefits at the expense of examples seen later.
Why patient data cannot be easily forgotten? [paper]
- Ruolin Su, Xiao Liu, Sotirios A. Tsaftaris. MICCAI 2022
- Key Word: Privacy; Patient-wise Forgetting; Scrubbing.
- Digest
  We study the influence of patient data on model performance and formulate two hypotheses for a patient's data: either they are common and similar to other patients or form edge cases, i.e. unique and rare cases. We show that it is not possible to easily forget patient data. We propose a targeted forgetting approach to perform patient-wise forgetting. Extensive experiments on the benchmark Automated Cardiac Diagnosis Challenge dataset showcase the improved performance of the proposed targeted forgetting approach as opposed to a state-of-the-art method.
Approximate Data Deletion in Generative Models. [paper]
- Zhifeng Kong, Scott Alfeld.
- Key Word: Density Ratio Based Framework; Machine Unlearning; Generative Model.
- Digest
  Many approximate data deletion methods have been developed for supervised learning. Unsupervised learning, in contrast, remains largely an open problem when it comes to (approximate or exact) efficient data deletion. In this paper, we propose a density-ratio-based framework for generative models. Using this framework, we introduce a fast method for approximate data deletion and a statistical test for estimating whether or not training points have been deleted.
Data Leakage in Federated Averaging. [paper]
- Dimitar I. Dimitrov, Mislav Balunović, Nikola Konstantinov, Martin Vechev.
- Key Word: Federated Learning; Gradient Inversion Attacks and Defenses.
- Digest
  Recent attacks have shown that user data can be reconstructed from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. It is generally accepted that reconstructing data from FedAvg updates is much harder than FedSGD as: (i) there are unobserved intermediate weight updates, (ii) the order of inputs matters, and (iii) the order of labels changes every epoch. In this work, we propose a new optimization-based attack which successfully attacks FedAvg by addressing the above challenges. First, we solve the optimization problem using automatic differentiation that forces a simulation of the client's update for the reconstructed labels and inputs so as to match the received client update. Second, we address the unknown input order by treating images at different epochs as independent during optimization, while relating them with a permutation invariant prior. Third, we reconstruct the labels by estimating the parameters of existing FedSGD attacks at every FedAvg step.
A Framework for Understanding Model Extraction Attack and Defense. [paper]
- Xun Xian, Mingyi Hong, Jie Ding.
- Key Word: Model Extraction Attack and Defense.
- Digest
  To study the fundamental tradeoffs between model utility from a benign user's view and privacy from an adversary's view, we develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies. The developed concepts and theory match the empirical findings on the `equilibrium' between privacy and utility. In terms of optimization, the key ingredient that enables our results is a unified representation of the attack-defense problem as a min-max bi-level problem.
The Privacy Onion Effect: Memorization is Relative. [paper]
- Nicholas Carlini, Matthew Jagielski, Nicolas Papernot, Andreas Terzis, Florian Tramer, Chiyuan Zhang.
- Key Word: Memorization; Differential Privacy; Membership Inference Attacks and Defenses; Machine Unlearning.
- Digest
  Machine learning models trained on private datasets have been shown to leak their private data. While recent work has found that the average data point is rarely leaked, the outlier samples are frequently subject to memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the same attack. We perform several experiments to study this effect, and understand why it occurs. The existence of this effect has various consequences. For example, it suggests that proposals to defend against memorization without training with rigorous privacy guarantees are unlikely to be effective. Further, it suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users.
Certified Graph Unlearning. [paper]
- Eli Chien, Chao Pan, Olgica Milenkovic.
- Key Word: Machine Unlearning; Certified Data Removal; Graph Neural Networks.
- Digest
  Graph-structured data is ubiquitous in practice and often processed using graph neural networks (GNNs). With the adoption of recent laws ensuring the ``right to be forgotten'', the problem of graph data removal has become of significant importance. To address the problem, we introduce the first known framework for \emph{certified graph unlearning} of GNNs. In contrast to standard machine unlearning, new analytical and heuristic unlearning challenges arise when dealing with complex graph data. First, three different types of unlearning requests need to be considered, including node feature, edge and node unlearning. Second, to establish provable performance guarantees, one needs to address challenges associated with feature mixing during propagation. The underlying analysis is illustrated on the example of simple graph convolutions (SGC) and their generalized PageRank (GPR) extensions, thereby laying the theoretical foundation for certified unlearning of GNNs.
Fully Privacy-Preserving Federated Representation Learning via Secure Embedding Aggregation. [paper]
- Jiaxiang Tang, Jinbao Zhu, Songze Li, Kai Zhang, Lichao Sun.
- Key Word: Federated Learning; Privacy.
- Digest
  We consider a federated representation learning framework, where with the assistance of a central server, a group of N distributed clients train collaboratively over their private data, for the representations (or embeddings) of a set of entities (e.g., users in a social network). Under this framework, for the key step of aggregating local embeddings trained at the clients in a private manner, we develop a secure embedding aggregation protocol named SecEA, which provides information-theoretical privacy guarantees for the set of entities and the corresponding embeddings at each client simultaneously, against a curious server and up to T < N/2 colluding clients.
I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences. [paper]
- Daryna Oliynyk, Rudolf Mayer, Andreas Rauber.
- Key Word: Model Extraction Attacks; Survey.
- Digest
  Adversaries can create a copy of the model with (almost) identical behavior using the the prediction labels only. While many variants of this attack have been described, only scattered defence strategies have been proposed, addressing isolated threats. This raises the necessity for a thorough systematisation of the field of model stealing, to arrive at a comprehensive understanding why these attacks are successful, and how they could be holistically defended against. We address this by categorising and comparing model stealing attacks, assessing their performance, and exploring corresponding defence techniques in different settings.
Reconstructing Training Data from Trained Neural Networks. [paper] [code]
- Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani.
- Key Word: Reconstruction Attacks.
- Digest
  We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods. To the best of our knowledge, our results are the first to show that reconstructing a large portion of the actual training samples from a trained neural network classifier is generally possible. This has negative implications on privacy, as it can be used as an attack for revealing sensitive training data. We demonstrate our method for binary MLP classifiers on a few standard computer vision datasets.
A Survey on Gradient Inversion: Attacks, Defenses and Future Directions. [paper]
- Rui Zhang, Song Guo, Junxiao Wang, Xin Xie, Dacheng Tao. IJCAI 2022
- Key Word: Gradient Inversion Attacks and Defenses; Survey.
- Digest
  Recent studies have shown that the training samples can be recovered from gradients, which are called Gradient Inversion (GradInv) attacks. However, there remains a lack of extensive surveys covering recent advances and thorough analysis of this issue. In this paper, we present a comprehensive survey on GradInv, aiming to summarize the cutting-edge research and broaden the horizons for different domains.
Self-Supervised Pretraining for Differentially Private Learning. [paper]
- Arash Asadian, Evan Weidner, Lei Jiang.
- Key Word: Self-Supervised Pretraining; Differential Privacy.
- Digest
  We demonstrate self-supervised pretraining (SSP) is a scalable solution to deep learning with differential privacy (DP) regardless of the size of available public datasets in image classification. When facing the lack of public datasets, we show the features generated by SSP on only one single image enable a private classifier to obtain much better utility than the non-learned handcrafted features under the same privacy budget. When a moderate or large size public dataset is available, the features produced by SSP greatly outperform the features trained with labels on various complex private datasets under the same private budget.
PrivHAR: Recognizing Human Actions From Privacy-preserving Lens. [paper]
- Carlos Hinojosa, Miguel Marquez, Henry Arguello, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles.
- Key Word: Privacy-Preserving Lens Design; Human Action Recognition; Adversarial Training; Deep Optics.
- Digest
  The accelerated use of digital cameras prompts an increasing concern about privacy and security, particularly in applications such as action recognition. In this paper, we propose an optimizing framework to provide robust visual privacy protection along the human action recognition pipeline. Our framework parameterizes the camera lens to successfully degrade the quality of the videos to inhibit privacy attributes and protect against adversarial attacks while maintaining relevant features for activity recognition. We validate our approach with extensive simulations and hardware experiments.
Data Stealing Attack on Medical Images: Is it Safe to Export Networks from Data Lakes? [paper]
- Huiyu Li, Nicholas Ayache, Hervé Delingette.
- Key Word: Data Stealing Attacks; Medical Imaging; Heathcare.
- Digest
  We introduce the concept of data stealing attack during the export of neural networks. It consists in hiding some information in the exported network that allows the reconstruction outside the data lake of images initially stored in that data lake. More precisely, we show that it is possible to train a network that can perform lossy image compression and at the same time solve some utility tasks such as image segmentation.
On the Privacy Properties of GAN-generated Samples. [paper]
- Zinan Lin, Vyas Sekar, Giulia Fanti. AISTATS 2021
- Key Word: Generative Adversarial Nets; Differential Privacy; Membership Inference Attacks.
- Digest
  The privacy implications of generative adversarial networks (GANs) are a topic of great interest, leading to several recent algorithms for training GANs with privacy guarantees. By drawing connections to the generalization properties of GANs, we prove that under some assumptions, GAN-generated samples inherently satisfy some (weak) privacy guarantees. First, we show that if a GAN is trained on m samples and used to generate n samples, the generated samples are (epsilon, delta)-differentially-private for (epsilon, delta) pairs where delta scales as O(n/m). We show that under some special conditions, this upper bound is tight. Next, we study the robustness of GAN-generated samples to membership inference attacks. We model membership inference as a hypothesis test in which the adversary must determine whether a given sample was drawn from the training dataset or from the underlying data distribution.
Defense Against Gradient Leakage Attacks via Learning to Obscure Data. [paper]
- Yuxuan Wan, Han Xu, Xiaorui Liu, Jie Ren, Wenqi Fan, Jiliang Tang.
- Key Word: Gradient Leakage Defenses.
- Digest
  We propose a new defense method to protect the privacy of clients' data by learning to obscure data. Our defense method can generate synthetic samples that are totally distinct from the original samples, but they can also maximally preserve their predictive features and guarantee the model performance. Furthermore, our defense strategy makes the gradient leakage attack and its variants extremely difficult to reconstruct the client data.
Dataset Distillation using Neural Feature Regression. [paper]
- Yongchao Zhou, Ehsan Nezhadarya, Jimmy Ba.
- Key Word: Dataset Condensation; Continual Learning; Membership Inference Defenses.
- Digest
  we address these challenges using neural Feature Regression with Pooling (FRePo), achieving the state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods. The proposed algorithm is analogous to truncated backpropagation through time with a pool of models to alleviate various types of overfitting in dataset distillation. FRePo significantly outperforms the previous methods on CIFAR100, Tiny ImageNet, and ImageNet-1K. Furthermore, we show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
Federated Learning in Non-IID Settings Aided by Differentially Private Synthetic Data. [paper]
- Huancheng Chen, Haris Vikalo.
- Key Word: Federated Learning; Differential Privacy.
- Digest
  We propose FedDPMS (Federated Differentially Private Means Sharing), an FL algorithm in which clients deploy variational auto-encoders to augment local datasets with data synthesized using differentially private means of latent data representations communicated by a trusted server. Such augmentation ameliorates effects of data heterogeneity across the clients without compromising privacy.
FETA: Fairness Enforced Verifying, Training, and Predicting Algorithms for Neural Networks. [paper]
- Kiarash Mohammadi, Aishwarya Sivaraman, Golnoosh Farnadi.
- Key Word: Fairness; Verification.
- Digest
  We study the problem of verifying, training, and guaranteeing individual fairness of neural network models. A popular approach for enforcing fairness is to translate a fairness notion into constraints over the parameters of the model. However, such a translation does not always guarantee fair predictions of the trained neural network model. To address this challenge, we develop a counterexample-guided post-processing technique to provably enforce fairness constraints at prediction time.
Privacy for Free: How does Dataset Condensation Help Privacy? [paper]
- Tian Dong, Bo Zhao, Lingjuan Lyu. ICML 2022
- Key Word: Privacy; Dataset Condensation.
- Digest
  We for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency is also a better solution to replace the traditional data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy, and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of one sample has limited impact (O(m/n)) on the parameter distribution of networks trained on m samples synthesized from n(n≫m) raw samples by DC.
Benign Overparameterization in Membership Inference with Early Stopping. [paper]
- Jasper Tan, Daniel LeJeune, Blake Mason, Hamid Javadi, Richard G. Baraniuk.
- Key Word: Benign Overparameterization; Membership Inference Attacks; Early Stopping.
- Digest
  Does a neural network's privacy have to be at odds with its accuracy? In this work, we study the effects the number of training epochs and parameters have on a neural network's vulnerability to membership inference (MI) attacks, which aim to extract potentially private information about the training data. We first demonstrate how the number of training epochs and parameters individually induce a privacy-utility trade-off: more of either improves generalization performance at the expense of lower privacy. However, remarkably, we also show that jointly tuning both can eliminate this privacy-utility trade-off.
FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders. [paper] [code]
- Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Jiankang Deng, Xinchao Wang, Hakan Bilen, Yang You.
- Key Word: Privacy; Face Recognition.
- Digest
  We propose a novel framework FaceMAE, where the face privacy and recognition performance are considered simultaneously. Firstly, randomly masked face images are used to train the reconstruction module in FaceMAE. We tailor the instance relation matching (IRM) module to minimize the distribution gap between real faces and FaceMAE reconstructed ones. During the deployment phase, we use trained FaceMAE to reconstruct images from masked faces of unseen identities without extra training.
Unlocking High-Accuracy Differentially Private Image Classification through Scale. [paper] [code]
- Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle.
- Key Word: Differential Privacy; Image Classication.
- Digest
  Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%.
SPAct: Self-supervised Privacy Preservation for Action Recognition. [paper] [code]
- Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah. CVPR 2022
- Key Word: Self-Supervion; Privacy; Action Recognition.
- Digest
  Recent developments of self-supervised learning (SSL) have unleashed the untapped potential of the unlabeled data. For the first time, we present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels. Our training framework consists of three main components: anonymization function, self-supervised privacy removal branch, and action recognition branch. We train our framework using a minimax optimization strategy to minimize the action recognition cost function and maximize the privacy cost function through a contrastive self-supervised loss.
Robust Unlearnable Examples: Protecting Data Against Adversarial Learning. [paper] [code]
- Shaopeng Fu, Fengxiang He, Yang Liu, Li Shen, Dacheng Tao. ICLR 2022
- Key Word: Privacy; Adversarial Training.
- Digest
  We first find that the vanilla error-minimizing noise, which suppresses the informative knowledge of data via minimizing the corresponding training loss, could not effectively minimize the adversarial training loss. This explains the vulnerability of error-minimizing noise in adversarial training. Based on the observation, robust error-minimizing noise is then introduced to reduce the adversarial training loss.
Quantifying Memorization Across Neural Language Models. [paper]
- Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang.
- Key Word: Reconstruction Attacks; Membership Inference Attacks; Language Models.
- Digest
  We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data. Memorization significantly grows as we increase (1) the capacity of a model, (2) the number of times an example has been duplicated, and (3) the number of tokens of context used to prompt the model. Surprisingly, we find the situation becomes complicated when generalizing these results across model families. On the whole, we find that memorization in LMs is more prevalent than previously believed and will likely get worse as models continues to scale, at least without active mitigations.
What Does it Mean for a Language Model to Preserve Privacy? [paper]
- Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, Florian Tramèr. FAccT 2022
- Key Word: Natural Language Processing; Differential Privacy; Data Sanitization.
- Digest
  We discuss the mismatch between the narrow assumptions made by popular data protection techniques (data sanitization and differential privacy), and the broadness of natural language and of privacy as a social norm. We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models. We conclude that language models should be trained on text data which was explicitly produced for public use.
Bounding Training Data Reconstruction in Private (Deep) Learning. [paper] [code]
- Chuan Guo, Brian Karrer, Kamalika Chaudhuri, Laurens van der Maaten. ICML 2022
- Key Word: Reconstruction Attacks; Differential Privacy.
- Digest
  Existing semantic guarantees for DP focus on membership inference, which may overestimate the adversary's capabilities and is not applicable when membership status itself is non-sensitive. In this paper, we derive the first semantic guarantees for DP mechanisms against training data reconstruction attacks under a formal threat model. We show that two distinct privacy accounting methods -- Renyi differential privacy and Fisher information leakage -- both offer strong semantic protection against data reconstruction attacks.
Variational Model Inversion Attacks. [paper] [code]
- Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani. NeurIPS 2021
- Key Word: Model Inversion Attacks.
- Digest
  We provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy. In order to optimize this variational objective, we choose a variational family defined in the code space of a deep generative model, trained on a public auxiliary dataset that shares some structural similarity with the target dataset.
Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks. [paper] [code]
- Lukas Struppek, Dominik Hintersdorf, Antonio De Almeida Correia, Antonia Adler, Kristian Kersting. ICML 2022
- Key Word: Model Inversion Attacks.
- Digest
  Previous research has developed generative MIAs that use generative adversarial networks (GANs) as image priors tailored to a specific target model. This makes the attacks time- and resource-consuming, inflexible, and susceptible to distributional shifts between datasets. To overcome these drawbacks, we present Plug & Play Attacks, which relax the dependency between the target model and image prior, and enable the use of a single GAN to attack a wide range of targets, requiring only minor adjustments to the attack.
Reconstructing Training Data with Informed Adversaries. [paper] [code]
- Borja Balle, Giovanni Cherubin, Jamie Hayes. S&P 2022
- Key Word: Reconstruction Attacks; Differential Privacy.
- Digest
  This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one. By instantiating concrete attacks, we show it is feasible to reconstruct the remaining data point in this stringent threat model. For convex models (e.g. logistic regression), reconstruction attacks are simple and can be derived in closed-form. For more general models (e.g. neural networks), we propose an attack strategy based on training a reconstructor network that receives as input the weights of the model under attack and produces as output the target data point.

Fairness

FARE: Provably Fair Representation Learning. [paper]
- Nikola Jovanović, Mislav Balunović, Dimitar I. Dimitrov, Martin Vechev.
- Key Word: Provable Fairness; Group Fairness.
- Digest
  Recent work has shown that prior methods achieve worse accuracy-fairness tradeoffs than originally suggested by their results. This dictates the need for FRL methods that provide provable upper bounds on unfairness of any downstream classifier, a challenge yet unsolved. In this work we address this challenge and propose Fairness with Restricted Encoders (FARE), the first FRL method with provable fairness guarantees. Our key insight is that restricting the representation space of the encoder enables us to derive suitable fairness guarantees, while allowing empirical accuracy-fairness tradeoffs comparable to prior work.
Fairness via Adversarial Attribute Neighbourhood Robust Learning. [paper]
- Qi Qi, Shervin Ardeshir, Yi Xu, Tianbao Yang.
- Key Word: Fairness; Robust Loss; Invariant Learning.
- Digest
  To enhance the model performs uniformly well in different sensitive attributes, we propose a principled Robust Adversarial Attribute Neighbourhood (RAAN) loss to debias the classification head and promote a fairer representation distribution across different sensitive attribute groups. The key idea of RAAN is to mitigate the differences of biased representations between different sensitive attribute groups by assigning each sample an adversarial robust weight, which is defined on the representations of adversarial attribute neighbors, i.e, the samples from different protected groups.
MEDFAIR: Benchmarking Fairness for Medical Imaging. [paper] [code]
- Yongshuo Zong, Yongxin Yang, Timothy Hospedales.
- Key Word: Fairness; Medical Imaging.
- Digest
  We introduce MEDFAIR, a framework to benchmark the fairness of machine learning models for medical imaging. MEDFAIR covers eleven algorithms from various categories, nine datasets from different imaging modalities, and three model selection criteria. Through extensive experiments, we find that the under-studied issue of model selection criterion can have a significant impact on fairness outcomes; while in contrast, state-of-the-art bias mitigation algorithms do not significantly improve fairness outcomes over empirical risk minimization (ERM) in both in-distribution and out-of-distribution settings.
A Survey of Fairness in Medical Image Analysis: Concepts, Algorithms, Evaluations, and Challenges. [paper]
- Zikang Xu, Jun Li, Qingsong Yao, Han Li, Xin Shi, S. Kevin Zhou.
- Key Word: Survey; Fairness; Medical Image Analysis; Healthcare.
- Digest
  We first give a comprehensive and precise definition of fairness, following by introducing currently used techniques in fairness issues in MedIA. After that, we list public medical image datasets that contain demographic attributes for facilitating the fairness research and summarize current algorithms concerning fairness in MedIA.
Fairness Reprogramming. [paper]
- Guanhua Zhang, Yihua Zhang, Yang Zhang, Wenqi Fan, Qing Li, Sijia Liu, Shiyu Chang.
- Key Word: Model reprogramming; Fairness.
- Digest
  We propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique. Specifically, FairReprogram considers the neural model fixed, and instead appends to the input a set of perturbations, called the fairness trigger, which is tuned towards the fairness criteria under a min-max formulation.
Fairness and robustness in anti-causal prediction. [paper]
- Maggie Makar, Alexander D'Amour.
- Key Word: Distribution Shifts; Causal Structure.
- Digest
  Robustness to distribution shift and fairness have independently emerged as two important desiderata required of modern machine learning models. While these two desiderata seem related, the connection between them is often unclear in practice. Here, we discuss these connections through a causal lens, focusing on anti-causal prediction tasks, where the input to a classifier (e.g., an image) is assumed to be generated as a function of the target label and the protected attribute. By taking this perspective, we draw explicit connections between a common fairness criterion - separation - and a common notion of robustness - risk invariance.
It's Not Fairness, and It's Not Fair: The Failure of Distributional Equality and the Promise of Relational Equality in Complete-Information Hiring Games. [paper]
- Benjamin Fish, Luke Stark.
- Key Word: Group Fairness; Individual Fairness; Causal Fairness.
- Digest
  Existing efforts to formulate computational definitions of fairness have largely focused on distributional notions of equality, where equality is defined by the resources or decisions given to individuals in the system. Yet existing discrimination and injustice is often the result of unequal social relations, rather than an unequal distribution of resources. Here, we show how optimizing for existing computational and economic definitions of fairness and equality fail to prevent unequal social relations. To do this, we provide an example of a self-confirming equilibrium in a simple hiring market that is relationally unequal but satisfies existing distributional notions of fairness.
Sustaining Fairness via Incremental Learning. [paper]
- Somnath Basu Roy Chowdhury, Snigdha Chaturvedi.
- Key Word: Fairness; Incremental Learning.
- Digest
  We propose to address this issue by introducing the problem of learning fair representations in an incremental learning setting. To this end, we present Fairness-aware Incremental Representation Learning (FaIRL), a representation learning system that can sustain fairness while incrementally learning new tasks. FaIRL is able to achieve fairness and learn new tasks by controlling the rate-distortion function of the learned representations.
Bugs in the Data: How ImageNet Misrepresents Biodiversity. [paper] [code]
- Alexandra Sasha Luccioni, David Rolnick.
- Key Word: ImageNet Bias.
- Digest
  We analyze the 13,450 images from 269 classes that represent wild animals in the ImageNet-1k validation set, with the participation of expert ecologists. We find that many of the classes are ill-defined or overlapping, and that 12% of the images are incorrectly labeled, with some classes having >90% of images incorrect. We also find that both the wildlife-related labels and images included in ImageNet-1k present significant geographical and cultural biases, as well as ambiguities such as artificial animals, multiple species in the same image, or the presence of humans.
Discover and Mitigate Unknown Biases with Debiasing Alternate Networks. [paper] [code]
- Zhiheng Li, Anthony Hoogs, Chenliang Xu. ECCV 2022
- Key Word: Bias Identification; Bias Mitigation; Fairness; Unsupervised Debiasing.
- Digest
  Deep image classifiers have been found to learn biases from datasets. To mitigate the biases, most previous methods require labels of protected attributes (e.g., age, skin tone) as full-supervision, which has two limitations: 1) it is infeasible when the labels are unavailable; 2) they are incapable of mitigating unknown biases -- biases that humans do not preconceive. To resolve those problems, we propose Debiasing Alternate Networks (DebiAN), which comprises two networks -- a Discoverer and a Classifier. By training in an alternate manner, the discoverer tries to find multiple unknown biases of the classifier without any annotations of biases, and the classifier aims at unlearning the biases identified by the discoverer.
Mitigating Algorithmic Bias with Limited Annotations. [paper] [code]
- Guanchu Wang, Mengnan Du, Ninghao Liu, Na Zou, Xia Hu.
- Key Word: Fairness; Active Bias Mitigation; Limited Annotations.
- Digest
  When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias. However, the skewed distribution across different sensitive groups preserves the skewness of the original dataset in the annotated subset, which leads to non-optimal bias mitigation. To tackle this challenge, we propose Active Penalization Of Discrimination (APOD), an interactive framework to guide the limited annotations towards maximally eliminating the effect of algorithmic bias. The proposed APOD integrates discrimination penalization with active instance selection to efficiently utilize the limited annotation budget, and it is theoretically proved to be capable of bounding the algorithmic bias.
Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey. [paper]
- Max Hort, Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman.
- Key Word: Bias Mitigation; Fairness; Survey.
- Digest
  This paper provides a comprehensive survey of bias mitigation methods for achieving fairness in Machine Learning (ML) models. We collect a total of 234 publications concerning bias mitigation for ML classifiers. These methods can be distinguished based on their intervention procedure (i.e., pre-processing, in-processing, post-processing) and the technology they apply. We investigate how existing bias mitigation methods are evaluated in the literature. In particular, we consider datasets, metrics and benchmarking. Based on the gathered insights (e.g., what is the most popular fairness metric? How many datasets are used for evaluating bias mitigation methods?).
Fair Machine Learning in Healthcare: A Review. [paper]
- Qizhang Feng, Mengnan Du, Na Zou, Xia Hu.
- Key Word: Fairness; Healthcare; Survey.
- Digest
  Benefiting from the digitization of healthcare data and the development of computing power, machine learning methods are increasingly used in the healthcare domain. Fairness problems have been identified in machine learning for healthcare, resulting in an unfair allocation of limited healthcare resources or excessive health risks for certain groups. Therefore, addressing the fairness problems has recently attracted increasing attention from the healthcare community. However, the intersection of machine learning for healthcare and fairness in machine learning remains understudied. In this review, we build the bridge by exposing fairness problems, summarizing possible biases, sorting out mitigation methods and pointing out challenges along with opportunities for the future.
Transferring Fairness under Distribution Shifts via Fair Consistency Regularization. [paper]
- Bang An, Zora Che, Mucong Ding, Furong Huang.
- Key Word: Fairness; Distribution Shifts; Regularization.
- Digest
  We study how to transfer model fairness under distribution shifts, a widespread issue in practice. We conduct a fine-grained analysis of how the fair model is affected under different types of distribution shifts and find that domain shifts are more challenging than subpopulation shifts. Inspired by the success of self-training in transferring accuracy under domain shifts, we derive a sufficient condition for transferring group fairness. Guided by it, we propose a practical algorithm with a fair consistency regularization as the key component.
Input-agnostic Certified Group Fairness via Gaussian Parameter Smoothing. [paper]
- Jiayin Jin, Zeru Zhang, Yang Zhou, Lingfei Wu.
- Key Word: Fair Classification; Group Fairness.
- Digest
  Only recently, researchers attempt to provide classification algorithms with provable group fairness guarantees. Most of these algorithms suffer from harassment caused by the requirement that the training and deployment data follow the same distribution. This paper proposes an input-agnostic certified group fairness algorithm, FairSmooth, for improving the fairness of classification models while maintaining the remarkable prediction accuracy. A Gaussian parameter smoothing method is developed to transform base classifiers into their smooth versions. An optimal individual smooth classifier is learnt for each group with only the data regarding the group and an overall smooth classifier for all groups is generated by averaging the parameters of all the individual smooth ones.
FairGrad: Fairness Aware Gradient Descent. [paper]
- Gaurav Maheshwari, Michaël Perrot.
- Key Word: Group Fairness.
- Digest
  We propose FairGrad, a method to enforce fairness based on a reweighting scheme that iteratively learns group specific weights based on whether they are advantaged or not. FairGrad is easy to implement and can accommodate various standard fairness definitions. Furthermore, we show that it is comparable to standard baselines over various datasets including ones used in natural language processing and computer vision.
Active Fairness Auditing. [paper]
- Tom Yan, Chicheng Zhang. ICML 2022
- Key Word: Active Learning.
- Digest
  The fast spreading adoption of machine learning (ML) by companies across industries poses significant regulatory challenges. One such challenge is scalability: how can regulatory bodies efficiently audit these ML models, ensuring that they are fair? In this paper, we initiate the study of query-based auditing algorithms that can estimate the demographic parity of ML models in a query-efficient manner. We propose an optimal deterministic algorithm, as well as a practical randomized, oracle-efficient algorithm with comparable guarantees.
What-Is and How-To for Fairness in Machine Learning: A Survey, Reflection, and Perspective. [paper]
- Zeyu Tang, Jiji Zhang, Kun Zhang.
- Key Word: Fairness; Causality; Bias Mitigation; Survey.
- Digest
  Algorithmic fairness has attracted increasing attention in the machine learning community. Various definitions are proposed in the literature, but the differences and connections among them are not clearly addressed. In this paper, we review and reflect on various fairness notions previously proposed in machine learning literature, and make an attempt to draw connections to arguments in moral and political philosophy, especially theories of justice. We also consider fairness inquiries from a dynamic perspective, and further consider the long-term impact that is induced by current prediction and decision.
How unfair is private learning? [paper]
- Amartya Sanyal, Yaxi Hu, Fanny Yang. UAI 2022
- Key Word: Fairness; Privacy.
- Digest
  As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further show that relaxing overall accuracy can lead to good fairness even with strict privacy requirements.
DebiasBench: Benchmark for Fair Comparison of Debiasing in Image Classification. [paper]
- Jungsoo Lee, Juyoung Lee, Sanghun Jung, Jaegul Choo.
- Key Word: Debiasing; Image Classification; Benchmark.
- Digest
  The goal of this paper is to standardize the inconsistent experimental settings and propose a consistent model parameter selection criterion for debiasing. Based on such unified experimental settings and model parameter selection criterion, we build a benchmark named DebiasBench which includes five datasets and seven debiasing methods. We carefully conduct extensive experiments in various aspects and show that different state-of-the-art methods work best in different datasets, respectively. Even, the vanilla method, the method with no debiasing module, also shows competitive results in datasets with low bias severity.
How Biased is Your Feature?: Computing Fairness Influence Functions with Global Sensitivity Analysis. [paper]
- Bishwamittra Ghosh, Debabrota Basu, Kuldeep S. Meel.
- Key Word: Fairness; Influence Function.
- Digest
  We aim to quantify the influence of different features on the bias of a classifier. To this end, we propose a framework of Fairness Influence Function (FIF), and compute it as a scaled difference of conditional variances in the prediction of the classifier. We also instantiate an algorithm, FairXplainer, that uses variance decomposition among the subset of features and a local regressor to compute FIFs accurately, while also capturing the intersectional effects of the features.
Fairness Transferability Subject to Bounded Distribution Shift. [paper]
- Yatong Chen, Reilly Raab, Jialu Wang, Yang Liu.
- Key Word: Fairness; Distribution Shift.
- Digest
  We study the transferability of statistical group fairness for machine learning predictors (i.e., classifiers or regressors) subject to bounded distribution shift, a phenomenon frequently caused by user adaptation to a deployed model or a dynamic environment. Herein, we develop a bound characterizing such transferability, flagging potentially inappropriate deployments of machine learning for socially consequential tasks.
Inducing bias is simpler than you think. [paper]
- Stefano Sarao Mannelli, Federica Gerace, Negar Rostamzadeh, Luca Saglietti.
- Key Word: Fairness.
- Digest
  We introduce a solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical behaviour of learning models trained in our synthetic framework and find similar unfairness behaviours as those observed on more realistic data. However, we also identify a positive transfer effect between the different subpopulations within the data. This suggests that mixing data with different statistical properties could be helpful, provided the learning model is made aware of this structure.
Mitigating Dataset Bias by Using Per-sample Gradient. [paper]
- Sumyeong Ahn, Seongyoon Kim, Se-young Yun.
- Key Word: Debiasing; Benchmark; Invariant Learning.
- Digest
  We propose a debiasing algorithm, called PGD (Per-sample Gradient-based Debiasing), that comprises three steps: (1) training a model on uniform batch sampling, (2) setting the importance of each sample in proportion to the norm of the sample gradient, and (3) training the model using importance-batch sampling, whose probability is obtained in step (2). Compared with existing baselines for various synthetic and real-world datasets, the proposed method showed state-of-the-art accuracy for a the classification task. Furthermore, we describe theoretical understandings about how PGD can mitigate dataset bias.
Certifying Some Distributional Fairness with Subpopulation Decomposition. [paper]
- Mintong Kang, Linyi Li, Maurice Weber, Yang Liu, Ce Zhang, Bo Li.
- Key Word: Certified Fairness.
- Digest
  We first formulate the certified fairness of an ML model trained on a given data distribution as an optimization problem based on the model performance loss bound on a fairness constrained distribution, which is within bounded distributional distance with the training distribution. We then propose a general fairness certification framework and instantiate it for both sensitive shifting and general shifting scenarios. In particular, we propose to solve the optimization problem by decomposing the original data distribution into analytical subpopulations and proving the convexity of the subproblems to solve them. We evaluate our certified fairness on six real-world datasets and show that our certification is tight in the sensitive shifting scenario and provides non-trivial certification under general shifting.
Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations. [paper]
- Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, Himabindu Lakkaraju. AIES 2022
- Key Word: Fairness; Interpretability.
- Digest
  We first outline the key properties which constitute explanation quality and where disparities can be particularly problematic. We then leverage these properties to propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods. Using this framework, we carry out a rigorous empirical analysis to understand if and when group-based disparities in explanation quality arise. Our results indicate that such disparities are more likely to occur when the models being explained are complex and highly non-linear. In addition, we also observe that certain post hoc explanation methods (e.g., Integrated Gradients, SHAP) are more likely to exhibit the aforementioned disparities.
Long-Tailed Recognition via Weight Balancing. [paper] [code]
- Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, Shu Kong. CVPR 2022
- Key Word: Long-tailed Recognition; Weight Balancing.
- Digest
  The key to addressing LTR is to balance various aspects including data distribution, training losses, and gradients in learning. We explore an orthogonal direction, weight balancing, motivated by the empirical observation that the naively trained classifier has "artificially" larger weights in norm for common classes (because there exists abundant data to train them, unlike the rare classes). We investigate three techniques to balance weights, L2-normalization, weight decay, and MaxNorm. We first point out that L2-normalization "perfectly" balances per-class weights to be unit norm, but such a hard constraint might prevent classes from learning better classifiers. In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius. Our extensive study shows that both help learn balanced weights and greatly improve the LTR accuracy. Surprisingly, weight decay, although underexplored in LTR, significantly improves over prior work. Therefore, we adopt a two-stage training paradigm and propose a simple approach to LTR: (1) learning features using the cross-entropy loss by tuning weight decay, and (2) learning classifiers using class-balanced loss by tuning weight decay and MaxNorm.
Is Fairness Only Metric Deep? Evaluating and Addressing Subgroup Gaps in Deep Metric Learning. [paper] [code]
- Natalie Dullerud, Karsten Roth, Kimia Hamidieh, Nicolas Papernot, Marzyeh Ghassemi. ICLR 2022
- Key Word: Metric Learning; Fairness.
- Digest
  We are the first to evaluate state-of-the-art DML methods trained on imbalanced data, and to show the negative impact these representations have on minority subgroup performance when used for downstream tasks. In this work, we first define fairness in DML through an analysis of three properties of the representation space -- inter-class alignment, intra-class alignment, and uniformity -- and propose finDML, the fairness in non-balanced DML benchmark to characterize representation fairness.
Linear Adversarial Concept Erasure. [paper] [code]
- Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell. ICML 2022
- Key Word: Fairness; Concept Removal; Bias Mitigation; Interpretability.
- Digest
  We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear minimax game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, R-LACE, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method -- despite being linear -- is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.

Interpretability

Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing. [paper]
- Iro Laina, Yuki M Asano, Andrea Vedaldi. ICLR 2022
- Key Word: Interpretability; Unsupervision.
- Digest
  Self-supervised visual representation learning has recently attracted significant research interest. While a common way to evaluate self-supervised representations is through transfer to various downstream tasks, we instead investigate the problem of measuring their interpretability, i.e. understanding the semantics encoded in raw representations. We formulate the latter as estimating the mutual information between the representation and a space of manually labelled concepts.
Attention-based Interpretability with Concept Transformers. [paper]
- Mattia Rigotti, Christoph Miksovic, Ioana Giurgiu, Thomas Gschwind, Paolo Scotton. ICLR 2022
- Key Word: Transformers; Interpretability.
- Digest
  We propose the generalization of attention from low-level input features to high-level concepts as a mechanism to ensure the interpretability of attention scores within a given application domain. In particular, we design the ConceptTransformer, a deep learning module that exposes explanations of the output of a model in which it is embedded in terms of attention over user-defined high-level concepts.
Fooling Explanations in Text Classifiers. [paper]
- Adam Ivankay, Ivan Girardi, Chiara Marchiori, Pascal Frossard. ICLR 2022
- Key Word: Attribution Robustness; Natural Language Processing.
- Digest
  It has been shown that explanation methods in vision applications are susceptible to local, imperceptible perturbations that can significantly alter the explanations without changing the predicted classes. We show here that the existence of such perturbations extends to text classifiers as well. Specifically, we introduce TextExplanationFooler (TEF), a novel explanation attack algorithm that alters text input samples imperceptibly so that the outcome of widely-used explanation methods changes considerably while leaving classifier predictions unchanged.
Explanations of Black-Box Models based on Directional Feature Interactions. [paper]
- Aria Masoomi, Davin Hill, Zhonghui Xu, Craig P Hersh, Edwin K. Silverman, Peter J. Castaldi, Stratis Ioannidis, Jennifer Dy. ICLR 2022
- Key Word: Explainability; Shapley Values; Feature Interactions.
- Digest
  Several recent works explain black-box models by capturing the most influential features for prediction per instance; such explanation methods are univariate, as they characterize importance per feature. We extend univariate explanation to a higher-order; this enhances explainability, as bivariate methods can capture feature interactions in black-box models, represented as a directed graph.
Verifying And Interpreting Neural Networks using Finite Automata. [paper] [code]
- Marco Sälzer, Eric Alsmann, Florian Bruse, Martin Lange.
- Key Word: Finite State Automata; Verification; Interpretation.
- Digest
  Verifying properties and interpreting the behaviour of deep neural networks (DNN) is an important task given their ubiquitous use in applications, including safety-critical ones, and their blackbox nature. We propose an automata-theoric approach to tackling problems arising in DNN analysis. We show that the input-output behaviour of a DNN can be captured precisely by a (special) weak Büchi automaton of exponential size. We show how these can be used to address common verification and interpretation tasks like adversarial robustness, minimum sufficient reasons etc. We report on a proof-of-concept implementation translating DNN to automata on finite words for better efficiency at the cost of losing precision in analysis.
Self-explaining deep models with logic rule reasoning. [paper] [code]
- Seungeon Lee, Xiting Wang, Sungwon Han, Xiaoyuan Yi, Xing Xie, Meeyoung Cha. NeurIPS 2022
- Key Word: Self-explanation; Logic rule; Interpretability.
- Digest
  We present SELOR, a framework for integrating self-explaining capabilities into a given deep model to achieve both high prediction performance and human precision. By “human precision”, we refer to the degree to which humans agree with the reasons models provide for their predictions. Human precision affects user trust and allows users to collaborate closely with the model. We demonstrate that logic rule explanations naturally satisfy human precision with the expressive power required for good predictive performance. We then illustrate how to enable a deep model to predict and explain with logic rules. Our method does not require predefined logic rule sets or human annotations and can be learned efficiently and easily with widely-used deep learning modules in a differentiable way. Extensive experiments show that our method gives explanations closer to human decision logic than other methods while maintaining the performance of deep learning models.
Efficient Dataset Distillation Using Random Feature Approximation. [paper] [code]
- Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus. NeurIPS 2022
- Key Word: Dataset Distillation; Interpretability; Privacy; Neural Network Gaussian Process; Neural Tangent Kernel; Random Feature.
- Digest
  We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel, which reduces the kernel matrix computation to O(|S|). Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets, both in kernel regression and finite-width network training. We demonstrate the effectiveness of our approach on tasks involving model interpretability and privacy preservation.
Diffusion Visual Counterfactual Explanations. [paper] [code]
- Maximilian Augustin, Valentyn Boreiko, Francesco Croce, Matthias Hein. NeurIPS 2022
- Key Word: Counterfactual Explainability; Diffusion Models.
- Digest
  Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts, or are limited to image classification problems with few classes. In this paper, we overcome this by generating Diffusion Visual Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers via a diffusion process.
"Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction. [paper]
- Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández.
- Key Word: Explainable AI; Human-AI Interaction.
- Digest
  Despite the proliferation of explainable AI (XAI) methods, little is understood about end-users' explainability needs. This gap is critical, because end-users may have needs that XAI methods should but don't yet support. To address this gap and contribute to understanding how explainability can support human-AI interaction, we conducted a study of a real-world AI application via interviews with 20 end-users of Merlin, a bird-identification app.
WeightedSHAP: analyzing and improving Shapley based feature attributions. [paper]
- Yongchan Kwon, James Zou.
- Key Word: Model Interpretation; Shapley Value.
- Digest
  Our work performs a rigorous analysis of the potential limitations of Shapley feature attribution. We identify simple settings where the Shapley value is mathematically suboptimal by assigning larger attributions for less influential features. Motivated by this observation, we propose WeightedSHAP, which generalizes the Shapley value and learns which marginal contributions to focus directly from data.
Concept Activation Regions: A Generalized Framework For Concept-Based Explanations. [paper] [code]
- Jonathan Crabbé, Mihaela van der Schaar. NeurIPS 2022
- Key Word: Concept Activation Vector.
- Digest
  We propose to relax this assumption by allowing concept examples to be scattered across different clusters in the DNN's latent space. Each concept is then represented by a region of the DNN's latent space that includes these clusters and that we call concept activation region (CAR). To formalize this idea, we introduce an extension of the CAV formalism that is based on the kernel trick and support vector classifiers. This CAR formalism yields global concept-based explanations and local concept-based feature importance.
EMaP: Explainable AI with Manifold-based Perturbations. [paper]
- Minh N. Vu, Huy Q. Mai, My T. Thai.
- Key Word: Black-box Explanations; Topological Data Analysis; Robust Machine Learning.
- Digest
  The goal of this work is to introduce a novel perturbation scheme so that more faithful and robust explanations can be obtained. Our study focuses on the impact of perturbing directions on the data topology. We show that perturbing along the orthogonal directions of the input manifold better preserves the data topology, both in the worst-case analysis of the discrete Gromov-Hausdorff distance and in the average-case analysis via persistent homology.
Explainable AI for clinical and remote health applications: a survey on tabular and time series data. [paper]
- Flavio Di Martino, Franca Delmastro.
- Key Word: Explainable AI; Tabular Data; Time-series Data; Healthcare.
- Digest
  To provide an overview of XAI methods that are most suitable for tabular and time series data in the healthcare domain, this paper provides a review of the literature in the last 5 years, illustrating the type of generated explanations and the efforts provided to evaluate their relevance and quality. Specifically, we identify clinical validation, consistency assessment, objective and standardised quality evaluation, and human-centered quality assessment as key features to ensure effective explanations for the end users.
Concept-Based Explanations for Tabular Data. [paper]
- Varsha Pendyala, Jihye Choi.
- Key Word: Concept Activation Vector; Conceptual Sensitivity.
- Digest
  We extend TCAV, the concept attribution approach, to tabular learning, by providing an idea on how to define concepts over tabular data. On a synthetic dataset with ground-truth concept explanations and a real-world dataset, we show the validity of our method in generating interpretability results that match the human-level intuitions.
From Shapley Values to Generalized Additive Models and back. [paper] [code]
- Sebastian Bordt, Ulrike von Luxburg.
- Key Word: Shapley Values; Generalized Additive Models.
- Digest
  We introduce n-Shapley Values, a natural extension of Shapley Values that explain individual predictions with interaction terms up to order n. As n increases, the n-Shapley Values converge towards the Shapley-GAM, a uniquely determined decomposition of the original function. From the Shapley-GAM, we can compute Shapley Values of arbitrary order, which gives precise insights into the limitations of these explanations. We then show that Shapley Values recover generalized additive models of order n, assuming that we allow for interaction terms up to order n in the explanations.
The Utility of Explainable AI in Ad Hoc Human-Machine Teaming. [paper]
- Rohan Paleja, Muyleng Ghuy, Nadun Ranawaka Arachchige, Reed Jensen, Matthew Gombolay. NeurIPS 2021
- Key Word: Human-Machine Teaming; Explainable AI.
- Digest
  Recent advances in machine learning have led to growing interest in Explainable AI (xAI) to enable humans to gain insight into the decision-making of machine learning models. Despite this recent interest, the utility of xAI techniques has not yet been characterized in human-machine teaming. Importantly, xAI offers the promise of enhancing team situational awareness (SA) and shared mental model development, which are the key characteristics of effective human-machine teams. Rapidly developing such mental models is especially critical in ad hoc human-machine teaming, where agents do not have a priori knowledge of others' decision-making strategies. In this paper, we present two novel human-subject experiments quantifying the benefits of deploying xAI techniques within a human-machine teaming scenario.
Responsibility: An Example-based Explainable AI approach via Training Process Inspection. [paper]
- Faraz Khadivpour, Arghasree Banerjee, Matthew Guzdial.
- Key Word: Influential Training Samples.
- Digest
  We present a novel XAI approach we call Responsibility that identifies the most responsible training example for a particular decision. This example can then be shown as an explanation: "this is what I (the AI) learned that led me to do that". We present experimental results across a number of domains along with the results of an Amazon Mechanical Turk user study, comparing responsibility and existing XAI methods on an image classification task. Our results demonstrate that responsibility can help improve accuracy for both human end users and secondary ML models.
Measuring the Interpretability of Unsupervised Representations via Quantized Reverse Probing. [paper]
- Iro Laina, Yuki M. Asano, Andrea Vedaldi. ICLR 2022
- Key Word: Interpretability; Self-Supervised Learning.
- Digest
  While a common way to evaluate self-supervised representations is through transfer to various downstream tasks, we instead investigate the problem of measuring their interpretability, i.e. understanding the semantics encoded in raw representations. We formulate the latter as estimating the mutual information between the representation and a space of manually labelled concepts. To quantify this we introduce a decoding bottleneck: information must be captured by simple predictors, mapping concepts to clusters in representation space. This approach, which we call reverse linear probing, provides a single number sensitive to the semanticity of the representation.
"Is your explanation stable?": A Robustness Evaluation Framework for Feature Attribution. [paper]
- Yuyou Gan, Yuhao Mao, Xuhong Zhang, Shouling Ji, Yuwen Pu, Meng Han, Jianwei Yin, Ting Wang. CCS 2022
- Key Word: Attributional Robustness.
- Digest
  We propose a model-agnostic method \emph{Median Test for Feature Attribution} (MeTFA) to quantify the uncertainty and increase the stability of explanation algorithms with theoretical guarantees. MeTFA has the following two functions: (1) examine whether one feature is significantly important or unimportant and generate a MeTFA-significant map to visualize the results; (2) compute the confidence interval of a feature attribution score and generate a MeTFA-smoothed map to increase the stability of the explanation.
Interpreting Black-box Machine Learning Models for High Dimensional Datasets. [paper]
- Md. Rezaul Karim, Md. Shajalal, Alex Graß, Till Döhmen, Sisay Adugna Chala, Christian Beecks, Stefan Decker.
- Key Word: Curse of Dimensionality; Interpretability; Attention mechanism; Model Surrogation.
- Digest
  We first train a black-box model on a high-dimensional dataset to learn the embeddings on which the classification is performed. To decompose the inner working principles of the black-box model and to identify top-k important features, we employ different probing and perturbing techniques. We then approximate the behavior of the black-box model by means of an interpretable surrogate model on the top-k feature space. Finally, we derive decision rules and local explanations from the surrogate model to explain individual decisions.
SoK: Explainable Machine Learning for Computer Security Applications. [paper]
- Azqa Nadeem, Daniël Vos, Clinton Cao, Luca Pajola, Simon Dieck, Robert Baumgartner, Sicco Verwer.
- Key Word: Explainable Machine Learning; Cybersecurity; Survey.
- Digest
  Explainable Artificial Intelligence (XAI) is a promising solution to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, that utilize XAI for 5 different objectives within an ML pipeline, namely 1) XAI-enabled decision support, 2) applied XAI for security tasks, 3) model verification via XAI, 4) explanation verification & robustness, and 5) offensive use of explanations.
Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior. [paper]
- Jean-Stanislas Denain, Jacob Steinhardt.
- Key Word: Anomalous Models; Feature Attributions;
- Digest
  Transparency methods such as model visualizations provide information that outputs alone might miss, since they describe the internals of neural networks. But can we trust that model explanations reflect model behavior? For instance, can they diagnose abnormal behavior such as backdoors or shape bias? To evaluate model explanations, we define a model as anomalous if it differs from a reference set of normal models, and we test whether transparency methods assign different explanations to anomalous and normal models. We find that while existing methods can detect stark anomalies such as shape bias or adversarial training, they struggle to identify more subtle anomalies such as models trained on incomplete data.
When adversarial attacks become interpretable counterfactual explanations. [paper]
- Mathieu Serrurier, Franck Mamalet, Thomas Fel, Louis Béthune, Thibaut Boissin.
- Key Word: Explainability; Interpretability; Saliency Maps.
- Digest
  We argue that, when learning a 1-Lipschitz neural network with the dual loss of an optimal transportation problem, the gradient of the model is both the direction of the transportation plan and the direction to the closest adversarial attack. Traveling along the gradient to the decision boundary is no more an adversarial attack but becomes a counterfactual explanation, explicitly transporting from one class to the other. Through extensive experiments on XAI metrics, we find that the simple saliency map method, applied on such networks, becomes a reliable explanation, and outperforms the state-of-the-art explanation approaches on unconstrained models. The proposed networks were already known to be certifiably robust, and we prove that they are also explainable with a fast and simple method.
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post hoc Explanations. [paper]
- Tessa Han, Suraj Srinivas, Himabindu Lakkaraju. NeurIPS 2022
- Key Word: Analyzing Post-hoc Explanations.
- Digest
  We adopt a function approximation perspective and formalize the local function approximation (LFA) framework. We show that popular explanation methods are instances of this framework, performing function approximations of the underlying model in different neighborhoods using different loss functions. We introduce a no free lunch theorem for explanation methods which demonstrates that no single method can perform optimally across all neighbourhoods and calls for choosing among methods.
Concept-level Debugging of Part-Prototype Networks. [paper] [code]
- Andrea Bontempelli, Stefano Teso, Fausto Giunchiglia, Andrea Passerini.
- Key Word: Part-Prototype Networks; Concept-level Debugging.
- Digest
  We propose ProtoPDebug, an effective concept-level debugger for ProtoPNets in which a human supervisor, guided by the model's explanations, supplies feedback in the form of what part-prototypes must be forgotten or kept, and the model is fine-tuned to align with this supervision. An extensive empirical evaluation on synthetic and real-world data shows that ProtoPDebug outperforms state-of-the-art debuggers for a fraction of the annotation cost.
Post-hoc Concept Bottleneck Models. [paper] [code]
- Mert Yuksekgonul, Maggie Wang, James Zou.
- Key Word: Concept Bottleneck Models; Model Editing.
- Digest
  We address the limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining interpretability benefits. When concept annotation is not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts. PCBM also enables users to quickly debug and update the model to reduce spurious correlations and improve generalization to new (potentially different) data.
Towards Better Understanding Attribution Methods. [paper]
- Sukrut Rao, Moritz Böhle, Bernt Schiele. CVPR 2022
- Key Word: Post-hoc Attribution.
- Digest
  Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic.
B-cos Networks: Alignment is All We Need for Interpretability. [paper] [code]
- Moritz Böhle, Mario Fritz, Bernt Schiele.
- Key Word: Weight-Input Alignment.
- Digest
  We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. For this, we propose to replace the linear transforms in DNNs by our B-cos transform. As we show, a sequence (network) of such transforms induces a single linear transform that faithfully summarises the full model computations.
Discovering Latent Concepts Learned in BERT. [paper]
- Fahim Dalvi, Abdul Rafae Khan, Firoj Alam, Nadir Durrani, Jia Xu, Hassan Sajjad. ICLR 2022
- Key Word: Interpretability; Natural Language Processing.
- Digest
  We study: i) what latent concepts exist in the pre-trained BERT model, ii) how the discovered latent concepts align or diverge from classical linguistic hierarchy and iii) how the latent concepts evolve across layers. Our findings show: i) a model learns novel concepts (e.g. animal categories and demographic groups), which do not strictly adhere to any pre-defined categorization (e.g. POS, semantic tags), ii) several latent concepts are based on multiple properties which may include semantics, syntax, and morphology, iii) the lower layers in the model dominate in learning shallow lexical concepts while the higher layers learn semantic relations and iv) the discovered latent concepts highlight potential biases learned in the model.
Do Users Benefit From Interpretable Vision? A User Study, Baseline, And Dataset. [paper] [code]
- Leon Sixt, Martin Schuessler, Oana-Iuliana Popescu, Philipp Weiß, Tim Landgraf. ICLR 2022
- Key Word: Interpretability; Human Subject Evaluation.
- Digest
  We assess if participants can identify the relevant set of attributes compared to the ground-truth. Our results show that the baseline outperformed concept-based explanations. Counterfactual explanations from an invertible neural network performed similarly as the baseline.
Model Agnostic Interpretability for Multiple Instance Learning. [paper] [code]
- Joseph Early, Christine Evers, Sarvapali Ramchurn. ICLR 2022
- Key Word: Multiple Instance Learning, Interpretability.
- Digest
  In Multiple Instance Learning (MIL), models are trained using bags of instances, where only a single label is provided for each bag. A bag label is often only determined by a handful of key instances within a bag, making it difficult to interpret what information a classifier is using to make decisions. In this work, we establish the key requirements for interpreting MIL models. We then go on to develop several model-agnostic approaches that meet these requirements.

Open-World Learning

Understanding Open-Set Recognition by Jacobian Norm of Representation. [paper]
- Jaewoo Park, Hojin Park, Eunju Jeong, Andrew Beng Jin Teoh.
- Key Word: Open-Set Recognition; Out-of-Distribution Detection.
- Digest
  This paper analyzes this emergent phenomenon by observing the Jacobian norm of representation. We theoretically show that minimizing the intra-class distances within the known set reduces the Jacobian norm of known class representations while maximizing the inter-class distances within the known set increases the Jacobian norm of the unknown class. The closed-set metric learning thus separates the unknown from the known by forcing their Jacobian norm values to differ. We empirically validate our theoretical framework with ample pieces of evidence using standard OSR datasets.
Holistic Segmentation. [paper]
- Stefano Gasperini, Frithjof Winkelmann, Alvaro Marcos-Ramiro, Micheal Schmidt, Nassir Navab, Benjamin Busam, Federico Tombari.
- Key Word: Open-set Panoptic Segmentation; Open-set Perception; Uncertainty Estimation.
- Digest
  We broaden the scope proposing holistic segmentation: a task to identify and separate unseen unknown objects into instances, without learning from unknowns, while performing panoptic segmentation of known classes. We tackle this new problem with U3HS, which first finds unknowns as highly uncertain regions, then clusters the corresponding instance-aware embeddings into individual objects.
Fine-grain Inference on Out-of-Distribution Data with Hierarchical Classification. [paper]
- Randolph Linderman, Jingyang Zhang, Nathan Inkawhich, Hai Li, Yiran Chen.
- Key Word: Fine-grain Out-of-Distribution Detection.
- Digest
  When the OOD sample significantly overlaps with the training data, a binary anomaly detection is not interpretable or explainable, and provides little information to the user. We propose a new model for OOD detection that makes predictions at varying levels of granularity as the inputs become more ambiguous, the model predictions become coarser and more conservative. Consider an animal classifier that encounters an unknown bird species and a car. Both cases are OOD, but the user gains more information if the classifier recognizes that its uncertainty over the particular species is too large and predicts bird instead of detecting it as OOD.
Measuring Human Perception to Improve Open Set Recognition. [paper]
- Jin Huang, Student Member, Derek Prijatelj, Justin Dulay, Walter Scheirer.
- Key Word: Open Set Recognition; Novelty Detection; Visual Psychophysics.
- Digest
  We designed and performed a large-scale behavioral experiment that collected over 200,000 human reaction time measurements associated with object recognition. The data collected indicated reaction time varies meaningfully across objects at the sample level. We therefore designed a new psychophysical loss function that enforces consistency with human behavior in deep networks which exhibit variable reaction time for different images.
Open-Set Semi-Supervised Object Detection. [paper] [code]
- Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira.
- Key Word: Open-Set Semi-Supervised Learning; Object Detection.
- Digest
  We first find the existing SSOD method obtains a lower performance gain in open-set conditions, and this is caused by the semantic expansion, where the distracting OOD objects are mispredicted as in-distribution pseudo-labels for the semi-supervised training. To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods.
Detecting the unknown in Object Detection. [paper]
- Dario Fontanel, Matteo Tarantino, Fabio Cermelli, Barbara Caputo.
- Key Word: Open-Set Detection; Discovery of Unknown Class.
- Digest
  In this work, we address the problem of detecting unknown objects, known as open-set object detection. We propose a novel training strategy, called UNKAD, able to predict unknown objects without requiring any annotation of them, exploiting non annotated objects that are already present in the background of training images. In particular, exploiting the four-steps training strategy of Faster R-CNN, UNKAD first identifies and pseudo-labels unknown objects and then uses the pseudo-annotations to train an additional unknown class. While UNKAD can directly detect unknown objects, we further combine it with previous unknown detection techniques, showing that it improves their performance at no costs.
Towards Open Set Video Anomaly Detection. [paper]
- Yuansheng Zhu, Wentao Bao, Qi Yu. ECCV 2022
- Key Word: Open-Set Recognition; Video Anomaly Detection.
- Digest
  Open Set Video Anomaly Detection (OpenVAD) aims to identify abnormal events from video data where both known anomalies and novel ones exist in testing. Unsupervised models learned solely from normal videos are applicable to any testing anomalies but suffer from a high false positive rate. In contrast, weakly supervised methods are effective in detecting known anomalies but could fail in an open world. We develop a novel weakly supervised method for the OpenVAD problem by integrating evidential deep learning (EDL) and normalizing flows (NFs) into a multiple instance learning (MIL) framework.
Self-Trained Proposal Networks for the Open World. [paper]
- Matthew Inkawhich, Nathan Inkawhich, Hai Li, Yiran Chen.
- Key Word: Self-Training; Open-Set Detection; Class-Agnostic Object Proposal.
- Digest
  We propose a classification-free Self-Trained Proposal Network (STPN) that leverages a novel self-training optimization strategy combined with dynamically weighted loss functions that account for challenges such as class imbalance and pseudo-label uncertainty. Not only is our model designed to excel in existing optimistic open-world benchmarks, but also in challenging operating environments where there is significant label bias. To showcase this, we devise two challenges to test the generalization of proposal models when the training data contains (1) less diversity within the labeled classes, and (2) fewer labeled instances.
Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization. [paper]
- Xizhe Xue, Dongdong Yu, Lingqiao Liu, Yu Liu, Ying Li, Zehuan Yuan, Ping Song, Mike Zheng Shou.
- Key Word: Class-agnostic; Open-world Instance Segmentation; Cross-task Consistency Loss.
- Digest
  Open-world instance segmentation (OWIS) aims to segment class-agnostic instances from images, which has a wide range of real-world applications such as autonomous driving. Most existing approaches follow a two-stage pipeline: performing class-agnostic detection first and then class-specific mask segmentation. In contrast, this paper proposes a single-stage framework to produce a mask for each instance directly. Also, instance mask annotations could be noisy in the existing datasets; to overcome this issue, we introduce a new regularization loss. Specifically, we first train an extra branch to perform an auxiliary task of predicting foreground regions, and then encourage the prediction from the auxiliary branch to be consistent with the predictions of the instance masks. The key insight is that such a cross-task consistency loss could act as an error-correcting mechanism to combat the errors in annotations.
Open Long-Tailed Recognition in a Dynamic World. [paper]
- Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu. TPAMI
- Key Word: Long-Tailed Recognition; Few-shot Learning; Open-Set Recognition; Active Learning.
- Digest
  Real world data often exhibits a long-tailed and open-ended (with unseen classes) distribution. A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes). We define Open Long-Tailed Recognition++ (OLTR++) as learning from such naturally distributed data and optimizing for the classification accuracy over a balanced test set which includes both known and open classes. OLTR++ handles imbalanced classification, few-shot learning, open-set recognition, and active learning in one integrated algorithm, whereas existing classification approaches often focus only on one or two aspects and deliver poorly over the entire spectrum.
From Known to Unknown: Quality-aware Self-improving Graph Neural Network for Open Set Social Event Detection. [paper] [code]
- Jiaqian Ren, Lei Jiang, Hao Peng, Yuwei Cao, Jia Wu, Philip S. Yu, Lifang He.
- Key Word: Open-set Social Event Detection; Graph Neural Network; Classification.
- Digest
  To address this problem, we design a Quality-aware Self-improving Graph Neural Network (QSGNN) which extends the knowledge from known to unknown by leveraging the best of known samples and reliable knowledge transfer. Specifically, to fully exploit the labeled data, we propose a novel supervised pairwise loss with an additional orthogonal inter-class relation constraint to train the backbone GNN encoder. The learnt, already-known events further serve as strong reference bases for the unknown ones, which greatly prompts knowledge acquisition and transfer. When the model is generalized to unknown data, to ensure the effectiveness and reliability, we further leverage the reference similarity distribution vectors for pseudo pairwise label generation, selection and quality assessment. Besides, we propose a novel quality-guided optimization in which the contributions of pseudo labels are weighted based on consistency.
Open-world Contrastive Learning. [paper]
- Yiyou Sun, Yixuan Li.
- Key Word: Contrastive learning; Open-world; Classification.
- Digest
  In this paper, we enrich the landscape of representation learning by tapping into an open-world setting, where unlabeled samples from novel classes can naturally emerge in the wild. To bridge the gap, we introduce a new learning framework, open-world contrastive learning (OpenCon). OpenCon tackles the challenges of learning compact representations for both known and novel classes, and facilitates novelty discovery along the way. We demonstrate the effectiveness of OpenCon on challenging benchmark datasets and establish competitive performance.
Few-Shot Class-Incremental Learning from an Open-Set Perspective. [paper] [code]
- Can Peng, Kun Zhao, Tianren Wang, Meng Li, Brian C. Lovell. ECCV 2022
- Key Word: Few-shot Class-Incremental Learning; Open-set; One-shot; Classification.
- Digest
  Here we explore the important task of Few-Shot Class-Incremental Learning (FSCIL) and its extreme data scarcity condition of one-shot. An ideal FSCIL model needs to perform well on all classes, regardless of their presentation order or paucity of data. It also needs to be robust to open-set real-world conditions and be easily adapted to the new tasks that always arise in the field. In this paper, we first reevaluate the current task setting and propose a more comprehensive and practical setting for the FSCIL task. Then, inspired by the similarity of the goals for FSCIL and modern face recognition systems, we propose our method -- Augmented Angular Loss Incremental Classification or ALICE. In ALICE, instead of the commonly used cross-entropy loss, we propose to use the angular penalty loss to obtain well-clustered features. As the obtained features not only need to be compactly clustered but also diverse enough to maintain generalization for future incremental classes, we further discuss how class augmentation, data augmentation, and data balancing affect classification performance.
Open World Learning Graph Convolution for Latency Estimation in Routing Networks. [paper]
- Yifei Jin, Marios Daoutis, Sarunas Girdzijauskas, Aristides Gionis. IJCNN 2022
- Key Word: Open-world Learning; Modeling Network Routing; Software Defined Networking.
- Digest
  Accurate routing network status estimation is a key component in Software Defined Networking. We propose a novel approach for modeling network routing, using Graph Neural Networks. Our method can also be used for network-latency estimation. Supported by a domain-knowledge-assisted graph formulation, our model shares a stable performance across different network sizes and configurations of routing networks, while at the same time being able to extrapolate towards unseen sizes, configurations, and user behavior. We show that our model outperforms most conventional deep-learning-based models, in terms of prediction accuracy, computational resources, inference speed, as well as ability to generalize towards open-world input.
Visual Recognition by Request. [paper] [code]
- Chufeng Tang, Lingxi Xie, Xiaopeng Zhang, Xiaolin Hu, Qi Tian.
- Key Word: Visual Recognition by Request; Open-domain; Knowledge Base.
- Digest
  In this paper, we present a novel protocol of annotation and evaluation for visual recognition. Different from traditional settings, the protocol does not require the labeler/algorithm to annotate/recognize all targets (objects, parts, etc.) at once, but instead raises a number of recognition instructions and the algorithm recognizes targets by request. This mechanism brings two beneficial properties to reduce the burden of annotation, namely, (i) variable granularity: different scenarios can have different levels of annotation, in particular, object parts can be labeled only in large and clear instances, (ii) being open-domain: new concepts can be added to the database in minimal costs. To deal with the proposed setting, we maintain a knowledge base and design a query-based visual recognition framework that constructs queries on-the-fly based on the requests. We evaluate the recognition system on two mixed-annotated datasets, CPP and ADE20K, and demonstrate its promising ability of learning from partially labeled data as well as adapting to new concepts with only text labels.
Towards Open Set 3D Learning: A Benchmark on Object Point Clouds. [paper] [code]
- Antonio Alliegro, Francesco Cappio Borlino, Tatiana Tommasi.
- Key Word: Open-set 3D Learning; In-domain and Cross-domain; Out-of-distribution.
- Digest
  In this context exploiting 3D data can be a valuable asset since it conveys rich information about the geometry of sensed objects and scenes. This paper provides the first broad study on Open Set 3D learning. We introduce a novel testbed with settings of increasing difficulty in terms of category semantic shift and cover both in-domain (synthetic-to-synthetic) and cross-domain (synthetic-to-real) scenarios. Moreover, we investigate the related out-of-distribution and Open Set 2D literature to understand if and how their most recent approaches are effective on 3D data. Our extensive benchmark positions several algorithms in the same coherent picture, revealing their strengths and limitations.
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models. [paper] [code]
- Huy Ha, Shuran Song.
- Key Word: Open-set Vocabulary; 3D scene understanding; Zero-shot.
- Digest
  We study open-world 3D scene understanding, a family of tasks that require agents to reason about their 3D environment with an open-set vocabulary and out-of-domain visual inputs - a critical skill for robots to operate in the unstructured 3D world. Towards this end, we propose Semantic Abstraction (SemAbs), a framework that equips 2D Vision-Language Models (VLMs) with new 3D spatial capabilities, while maintaining their zero-shot robustness. We achieve this abstraction using relevancy maps extracted from CLIP, and learn 3D spatial and geometric reasoning skills on top of those abstractions in a semantic-agnostic manner. We demonstrate the usefulness of SemAbs on two open-world 3D scene understanding tasks: 1) completing partially observed objects and 2) localizing hidden objects from language descriptions.
UC-OWOD: Unknown-Classified Open World Object Detection. [paper] [code]
- Quanshi Zhang, Xin Wang, Jie Ren, Xu Cheng, Shuyun Lin, Yisen Wang, Xiangming Zhu. ECCV 2022
- Key Word: Open World Object Detection.
- Digest
  Open World Object Detection (OWOD) is a challenging computer vision problem that requires detecting unknown objects and gradually learning the identified unknown classes. However, it cannot distinguish unknown instances as multiple unknown classes. In this work, we propose a novel OWOD problem called Unknown-Classified Open World Object Detection (UC-OWOD). UC-OWOD aims to detect unknown instances and classify them into different unknown classes. Besides, we formulate the problem and devise a two-stage object detector to solve UC-OWOD.
Difficulty-Aware Simulator for Open Set Recognition. [paper] [code]
- WonJun Moon, Junho Park, Hyun Seok Seong, Cheol-Ho Cho, Jae-Pil Heo. ECCV 2022
- Key Word: Open-set Recognition; Generative Adversarial Network.
- Digest
  We present a novel framework, DIfficulty-Aware Simulator (DIAS), that generates fakes with diverse difficulty levels to simulate the real world. We first investigate fakes from generative adversarial network (GAN) in the classifier's viewpoint and observe that these are not severely challenging. This leads us to define the criteria for difficulty by regarding samples generated with GANs having moderate-difficulty. To produce hard-difficulty examples, we introduce Copycat, imitating the behavior of the classifier. Furthermore, moderate- and easy-difficulty samples are also yielded by our modified GAN and Copycat, respectively.
More Practical Scenario of Open-set Object Detection: Open at Category Level and Closed at Super-category Level. [paper]
- Yusuke Hosoya, Masanori Suganuma, Takayuki Okatani.
- Key Word: Open-set Object Detection; Super-category.
- Digest
  We first point out that the scenario of OSOD considered in recent studies, which considers an unlimited variety of unknown objects similar to open-set recognition (OSR), has a fundamental issue. That is, we cannot determine what to detect and what not for such unlimited unknown objects, which is necessary for detection tasks. This issue leads to difficulty with the evaluation of methods' performance on unknown object detection. We then introduce a novel scenario of OSOD, which deals with only unknown objects that share the super-category with known objects. It has many real-world applications, e.g., detecting an increasing number of fine-grained objects. This new setting is free from the above issue and evaluation difficulty. Moreover, it makes detecting unknown objects more realistic owing to the visual similarity between known and unknown objects.
DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition. [paper] [code]
- Matej Grcić, Petra Bevandić, Siniša Šegvić. ECCV 2022
- Key Word: Anomaly detection; Dense anomaly detection; Open-set Recognition.
- Digest
  We design a novel hybrid algorithm based on reinterpreting discriminative logits as a logarithm of the unnormalized joint distribution p̂ (x,y). Our model builds on a shared convolutional representation from which we recover three dense predictions: i) the closed-set class posterior P(y|x), ii) the dataset posterior P(din|x), iii) unnormalized data likelihood p̂ (x). The latter two predictions are trained both on the standard training data and on a generic negative dataset. We blend these two predictions into a hybrid anomaly score which allows dense open-set recognition on large natural images. We carefully design a custom loss for the data likelihood in order to avoid backpropagation through the untractable normalizing constant Z(θ). Experiments evaluate our contributions on standard dense anomaly detection benchmarks as well as in terms of open-mIoU - a novel metric for dense open-set performance.
Towards Realistic Semi-Supervised Learning. [paper] [code]
- Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah. ECCV 2022 oral
- Key Word: Semi-supervised Learning; Open-world SSL; Discovery of Unknown class.
- Digest
  The standard SSL approach assumes unlabeled data are from the same distribution as annotated data. Recently, a more realistic SSL problem, called open-world SSL, is introduced, where the unannotated data might contain samples from unknown classes. In this paper, we propose a novel pseudo-label based approach to tackle SSL in open-world setting. At the core of our method, we utilize sample uncertainty and incorporate prior knowledge about class distribution to generate reliable class-distribution-aware pseudo-labels for unlabeled data belonging to both known and unknown classes. We also highlight the flexibility of our approach in solving novel class discovery task, demonstrate its stability in dealing with imbalanced data, and complement our approach with a technique to estimate the number of novel classes.
Open-world Semantic Segmentation for LIDAR Point Clouds. [paper] [code]
- Jun Cen, Peng Yun, Shiwei Zhang, Junhao Cai, Di Luan, Michael Yu Wang, Ming Liu, Mingqian Tang. ECCV 2022
- Key Word: Open-world Semantic Segmentation; LIDAR Point Clouds; Incremental Learning.
- Digest
  In this work, we propose the open-world semantic segmentation task for LIDAR point clouds, which aims to 1) identify both old and novel classes using open-set semantic segmentation, and 2) gradually incorporate novel objects into the existing knowledge base using incremental learning without forgetting old classes. For this purpose, we propose a REdundAncy cLassifier (REAL) framework to provide a general architecture for both the open-set semantic segmentation and incremental learning problems.
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization. [paper] [code]
- Peixian Chen, Kekai Sheng, Mengdan Zhang, Yunhang Shen, Ke Li, Chunhua Shen.
- Key Word: Open-vocabulary Object Detection; Backdoor Adjustment.
- Digest
  Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary. We present MEDet, a novel and effective OVD framework with proposal mining and prediction equalization. First, we design an online proposal mining to refine the inherited vision-semantic knowledge from coarse to fine, allowing for proposal-level detection-oriented feature alignment. Second, based on causal inference theory, we introduce a class-wise backdoor adjustment to reinforce the predictions on novel categories to improve the overall OVD performance.
Rethinking the Openness of CLIP. [paper]
- Shuhuai Ren, Lei Li, Xuancheng Ren, Guangxiang Zhao, Xu Sun.
- Key Word: Open-vocabulary; CLIP; Rethinking; In-depth Analysis.
- Digest
  Contrastive Language-Image Pre-training (CLIP) has demonstrated great potential in realizing open-vocabulary image classification in a matching style, because of its holistic use of natural language supervision that covers unconstrained real-world visual concepts. However, it is, in turn, also difficult to evaluate and analyze the openness of CLIP-like models, since they are in theory open to any vocabulary but the actual accuracy varies. To address the insufficiency of conventional studies on openness, we resort to an incremental view and define the extensibility, which essentially approximates the model's ability to deal with new visual concepts, by evaluating openness through vocabulary expansions. Our evaluation based on extensibility shows that CLIP-like models are hardly truly open and their performances degrade as the vocabulary expands to different degrees. Further analysis reveals that the over-estimation of openness is not because CLIP-like models fail to capture the general similarity of image and text features of novel visual concepts, but because of the confusion among competing text features, that is, they are not stable with respect to the vocabulary. In light of this, we propose to improve the openness of CLIP from the perspective of feature space by enforcing the distinguishability of text features. Our method retrieves relevant texts from the pre-training corpus to enhance prompts for inference, which boosts the extensibility and stability of CLIP even without fine-tuning.
Simple Open-Vocabulary Object Detection with Vision Transformers. [paper] [code]
- Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby. ECCV 2022
- Key Word: Open-vocabulary; Long-tail; Object detection; Vision Transformer.
- Digest
  For object detection, pre-training and scaling approaches are less well established, especially in the long-tailed and open-vocabulary setting, where training data is relatively scarce. In this paper, we propose a strong recipe for transferring image-text models to open-vocabulary object detection. We use a standard Vision Transformer architecture with minimal modifications, contrastive image-text pre-training, and end-to-end detection fine-tuning. Our analysis of the scaling properties of this setup shows that increasing image-level pre-training and model size yield consistent improvements on the downstream detection task. We provide the adaptation strategies and regularizations needed to attain very strong performance on zero-shot text-conditioned and one-shot image-conditioned object detection.
OSSGAN: Open-Set Semi-Supervised Image Generation. [paper] [code]
- Kai Katsumata, Duc Minh Vo, Hideki Nakayama. CVPR 2022
- Key Word: Open-set Semi-supervised Image Generation; Conditional GAN.
- Digest
  We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation, where the training dataset consists of two parts: (i) labeled data and (ii) unlabeled data with samples belonging to one of the labeled data classes, namely, a closed-set, and samples not belonging to any of the labeled data classes, namely, an open-set. Unlike the existing semi-supervised image generation task, where unlabeled data only contain closed-set samples, our task is more general and lowers the data collection cost in practice by allowing open-set samples to appear. Thanks to entropy regularization, the classifier that is trained on labeled data is able to quantify sample-wise importance to the training of cGAN as confidence, allowing us to use all samples in unlabeled data.
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity. [paper] [code]
- Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran. CVPR 2022
- Key Word: Open-world Instance Segmentation; Generic Grouping Networks; Pairwise Affinities.
- Digest
  Open-world instance segmentation is the task of grouping pixels into object instances without any pre-determined taxonomy. This is challenging, as state-of-the-art methods rely on explicit class semantics obtained from large labeled datasets, and out-of-domain evaluation performance drops significantly. Here we propose a novel approach for mask proposals, Generic Grouping Networks (GGNs), constructed without semantic supervision. Our approach combines a local measure of pixel affinity with instance-level mask supervision, producing a training regimen designed to make the model as generic as the data diversity allows. We introduce a method for predicting Pairwise Affinities (PA), a learned local relationship between pairs of pixels. PA generalizes very well to unseen categories. From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.
Full-Spectrum Out-of-Distribution Detection. [paper] [code]
- Jingkang Yang, Kaiyang Zhou, Ziwei Liu.
- Key Word: Benchmark; Anomaly Detection; Open-set Recognition; Out-of-Distribution Generalization.
- Digest
  We take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection, a more realistic problem setting that considers both detecting semantic shift and being tolerant to covariate shift; and designs three benchmarks. These new benchmarks have a more fine-grained categorization of distributions (i.e., training ID, covariate-shifted ID, near-OOD, and far-OOD) for the purpose of more comprehensively evaluating the pros and cons of algorithms.
FS6D: Few-Shot 6D Pose Estimation of Novel Objects. [paper] [code]
- Yisheng He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen. CVPR 2022
- Key Word: Open-World 6D Pose Estimation; Few-shot learning.
- Digest
  In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training. We point out the importance of fully exploring the appearance and geometric relationship between the given support views and query scene patches and propose a dense prototypes matching framework by extracting and matching dense RGBD prototypes with transformers. Moreover, we show that the priors from diverse appearances and shapes are crucial to the generalization capability and thus propose a large-scale RGBD photorealistic dataset (ShapeNet6D) for network pre-training. A simple and effective online texture blending approach is also introduced to eliminate the domain gap from the synthesis dataset, which enriches appearance diversity at a low cost.
PMAL: Open Set Recognition via Robust Prototype Mining. [paper] [code]
- Jing Lu, Yunxu Xu, Hao Li, Zhanzhan Cheng, Yi Niu. AAAI 2022
- Key Word: Open-set Recognition; Prototype Learning.
- Digest
  In this work, we propose a novel Prototype Mining And Learning (PMAL) framework. It has a prototype mining mechanism before the phase of optimizing embedding space, explicitly considering two crucial properties, namely high-quality and diversity of the prototype set. Concretely, a set of high-quality candidates are firstly extracted from training samples based on data uncertainty learning, avoiding the interference from unexpected noise. Considering the multifarious appearance of objects even in a single category, a diversity-based strategy for prototype set filtering is proposed. Accordingly, the embedding space can be better optimized to discriminate therein the predefined classes and between known and unknowns.
OpenTAL: Towards Open Set Temporal Action Localization. [paper] [code]
- Wentao Bao, Qi Yu, Yu Kong. CVPR 2022
- Key Word: Open-set Temporal Action Localization; Temporal Action Localization; Evidential Deep Learning.
- Digest
  In this paper, we, for the first time, step toward the Open Set TAL (OSTAL) problem and propose a general framework OpenTAL based on Evidential Deep Learning (EDL). Specifically, the OpenTAL consists of uncertainty-aware action classification, actionness prediction, and temporal location regression. With the proposed importance-balanced EDL method, classification uncertainty is learned by collecting categorical evidence majorly from important samples. To distinguish the unknown actions from background video frames, the actionness is learned by the positive-unlabeled learning. The classification uncertainty is further calibrated by leveraging the guidance from the temporal localization quality. The OpenTAL is general to enable existing TAL models for open set scenarios.

Environmental Well-being

Green Learning: Introduction, Examples and Outlook. [paper]
- C.-C. Jay Kuo, Azad M. Madni.
- Key Word: Green Learning, Trust Learning.
- Digest
  Rapid advances in artificial intelligence (AI) in the last decade have largely been built upon the wide applications of deep learning (DL). However, the high carbon footprint yielded by larger and larger DL networks becomes a concern for sustainability. Furthermore, DL decision mechanism is somewhat obsecure and can only be verified by test data. Green learning (GL) has been proposed as an alternative paradigm to address these concerns. GL is characterized by low carbon footprints, small model sizes, low computational complexity, and logical transparency. It offers energy-effective solutions in cloud centers as well as mobile/edge devices. GL also provides a clear and logical decision-making process to gain people's trust.
Measuring the Carbon Intensity of AI in Cloud Instances. [paper]
- Jesse Dodge, Taylor Prewitt, Remi Tachet Des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Noah A. Smith, Nicole DeCario, Will Buchanan. FAccT 2022
- Key Word: Carbon Emissions; Cloud.
- Digest
  We provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions by using location-based and time-specific marginal emissions data per energy unit. We provide measurements of operational software carbon intensity for a set of modern models for natural language processing and computer vision, and a wide range of model sizes, including pretraining of a 6.1 billion parameter language model.
The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. [paper]
- David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean.
- Key Word: Carbon Footprint.
- Digest
  We recommend that ML papers include emissions explicitly to foster competition on more than just model quality. Estimates of emissions in papers that omitted them have been off 100x-100,000x, so publishing emissions has the added benefit of ensuring accurate accounting. Given the importance of climate change, we must get the numbers right to make certain that we work on its biggest challenges.

Interactions with Blockchain

Trustworthy Federated Learning via Blockchain. [paper]
- Zhanpeng Yang, Yuanming Shi, Yong Zhou, Zixin Wang, Kai Yang.
- Key Word: Federated Learning; Blockchain.
- Digest
  We shall propose a decentralized blockchain based FL (B-FL) architecture by using a secure global aggregation algorithm to resist malicious devices, and deploying practical Byzantine fault tolerance consensus protocol with high effectiveness and low energy consumption among multiple edge servers to prevent model tampering from the malicious server. However, to implement B-FL system at the network edge, multiple rounds of cross-validation in blockchain consensus protocol will induce long training latency. We thus formulate a network optimization problem that jointly considers bandwidth and power allocation for the minimization of long-term average training latency consisting of progressive learning rounds.
A Fast Blockchain-based Federated Learning Framework with Compressed Communications. [paper]
- Laizhong Cui, Xiaoxin Su, Yipeng Zhou. JSAC
- Key Word: Blockchain-based Federated Learning.
- Digest
  To improve the practicality of BFL, we are among the first to propose a fast blockchain-based communication-efficient federated learning framework by compressing communications in BFL, called BCFL. Meanwhile, we derive the convergence rate of BCFL with non-convex loss. To maximize the final model accuracy, we further formulate the problem to minimize the training loss of the convergence rate subject to a limited training time with respect to the compression rate and the block generation rate, which is a bi-convex optimization problem and can be efficiently solved.
BPFISH: Blockchain and Privacy-preserving FL Inspired Smart Healthcare. [paper]
- Moirangthem Biken Singh, Ajay Pratap.
- Key Word: Blockchain; Federated Learning; Stable Matching; Differential Privacy; Smart Healthcare.
- Digest
  This paper proposes Federated Learning (FL) based smar t healthcare system where Medical Centers (MCs) train the local model using the data collected from patients and send the model weights to the miners in a blockchain-based robust framework without sharing raw data, keeping privacy preservation into deliberation. We formulate an optimization problem by maximizing the utility and minimizing the loss function considering energy consumption and FL process delay of MCs for learning effective models on distributed healthcare data underlying a blockchain-based framework.
BEAS: Blockchain Enabled Asynchronous & Secure Federated Machine Learning. [paper] [code]
- Arup Mondal, Harpreet Virk, Debayan Gupta.
- Key Word: Arup Mondal, Harpreet Virk, Debayan Gupta.
- Digest
  Federated Learning (FL) enables multiple parties to distributively train a ML model without revealing their private datasets. However, it assumes trust in the centralized aggregator which stores and aggregates model updates. This makes it prone to gradient tampering and privacy leakage by a malicious aggregator. Malicious parties can also introduce backdoors into the joint model by poisoning the training data or model gradients. To address these issues, we present BEAS, the first blockchain-based framework for N-party FL that provides strict privacy guarantees of training data using gradient pruning (showing improved differential privacy compared to existing noise and clipping based techniques).

Others

Advances, challenges and opportunities in creating data for trustworthy AI. [paper]
- Weixin Liang, Girmaw Abebe Tadesse, Daniel Ho, Fei-Fei Li, Matei Zaharia, Ce Zhang, James Zou. Nature Machine Intelligence
- Key Word: Trustworthy AI; Data Design; Data Sculpting; Data Strategies; Data Policy.
- Digest
  Automated AI model builders that are publicly available can now achieve top performance in many applications. In contrast, the design and sculpting of the data used to develop AI often rely on bespoke manual work, and they critically affect the trustworthiness of the model. This Perspective discusses key considerations for each stage of the data-for-AI pipeline—starting from data design to data sculpting (for example, cleaning, valuation and annotation) and data evaluation—to make AI more reliable.
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts. [paper]
- Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn.
- Key Word: Fine-Tuning; Distribution Shift.
- Digest
  A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best.
Scaling up Trustless DNN Inference with Zero-Knowledge Proofs. [paper]
- Daniel Kang, Tatsunori Hashimoto, Ion Stoica, Yi Sun.
- Key Word: Zero-Knowledge Proof; Inference Validation.
- Digest
  We present the first practical ImageNet-scale method to verify ML model inference non-interactively, i.e., after the inference has been done. To do so, we leverage recent developments in ZK-SNARKs (zero-knowledge succinct non-interactive argument of knowledge), a form of zero-knowledge proofs. ZK-SNARKs allows us to verify ML model execution non-interactively and with only standard cryptographic hardness assumptions.
DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning. [paper]
- Siqi Xu, Lin Liu, Zhonghua Liu. NeurIPS 2022
- Key Word: Causal Mediation Analysis; Fairness.
- Digest
  Causal mediation analysis can unpack the black box of causality and is therefore a powerful tool for disentangling causal pathways in biomedical and social sciences, and also for evaluating machine learning fairness. To reduce bias for estimating Natural Direct and Indirect Effects in mediation analysis, we propose a new method called DeepMed that uses deep neural networks (DNNs) to cross-fit the infinite-dimensional nuisance functions in the efficient influence functions.
FedMT: Federated Learning with Mixed-type Labels. [paper]
- Qiong Zhang, Aline Talhouk, Gang Niu, Xiaoxiao Li.
- Key Word: Federated Learning; Neural Tangent Kernel.
- Digest
  We consider an important yet under-explored setting of FL, namely FL with mixed-type labels where different labeling criteria can be employed by various centers, leading to inter-center label space differences and challenging existing FL methods designed for the classical setting. To effectively and efficiently train models with mixed-type labels, we propose a theory-guided and model-agnostic approach that can make use of the underlying correspondence between those label spaces and can be easily combined with various FL methods such as FedAvg.
Data Budgeting for Machine Learning. [paper]
- Xinyi Zhao, Weixin Liang, James Zou.
- Key Word: Benchmark; Data Budgeting.
- Digest
  We study the data budgeting problem and formulate it as two sub-problems: predicting (1) what is the saturating performance if given enough data, and (2) how many data points are needed to reach near the saturating performance. Different from traditional dataset-independent methods like PowerLaw, we proposed a learning method to solve data budgeting problems.
I Speak, You Verify: Toward Trustworthy Neural Program Synthesis. [paper]
- Darren Key, Wen-Ding Li, Kevin Ellis.
- Key Word: Program Synthesis; Natural Language to Code; Large Language Models.
- Digest
  We develop an approach for improving the trustworthiness and overall accuracy of program synthesizers based on large language models for source code. Given a natural language description of a programming problem, our method samples both candidate programs as well as candidate predicates specifying how the program should behave.
Causal Knowledge Transfer from Task Affinity. [paper]
- Ahmed Aloui, Juncheng Dong, Cat P. Le, Vahid Tarokh.
- Key Word: Causal Inference; Transfer Learning; Task Similarity.
- Digest
  We focus on transferring the causal knowledge acquired in prior experiments to new scenarios for which only limited data is available. To this end, we first observe that the absolute values of ITEs are invariant under the action of the symmetric group on the labels of treatments. Given this invariance, we propose a symmetrized task distance for calculating the similarity of a target scenario with those encountered before. The aforementioned task distance is then used to transfer causal knowledge from the closest of all the available previously learned tasks to the target scenario.
SoK: On the Impossible Security of Very Large Foundation Models. [paper]
- El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê-Nguyên Hoang, Rafael Pinot, John Stephan.
- Key Word: Foundation Models; Security; Privacy.
- Digest
  We identify several key features of today's foundation model learning problem which, given the current understanding in adversarial machine learning, suggest incompatibility of high accuracy with both security and privacy. We begin by observing that high accuracy seems to require (1) very high-dimensional models and (2) huge amounts of data that can only be procured through user-generated datasets. Moreover, such data is fundamentally heterogeneous, as users generally have very specific (easily identifiable) data-generating habits.
Totems: Physical Objects for Verifying Visual Integrity. [paper]
- Jingwei Ma, Lucy Chai, Minyoung Huh, Tongzhou Wang, Ser-Nam Lim, Phillip Isola, Antonio Torralba. ECCV 2022
- Key Word: Digitial Signatures; Detecting Image Manipulations.
- Digest
  We introduce a new approach to image forensics: placing physical refractive objects, which we call totems, into a scene so as to protect any photograph taken of that scene. Totems bend and redirect light rays, thus providing multiple, albeit distorted, views of the scene within a single image. A defender can use these distorted totem pixels to detect if an image has been manipulated. Our approach unscrambles the light rays passing through the totems by estimating their positions in the scene and using their known geometric and material properties.
Interventional Causal Representation Learning. [paper]
- Kartik Ahuja, Yixin Wang, Divyat Mahajan, Yoshua Bengio.
- Key Word: Interventional Data; Causal Representation Learning.
- Digest
  We explore the role of interventional data for identifiable representation learning in this work. We study the identifiability of latent causal factors with and without interventional data, under minimal distributional assumptions on the latents. We prove that, if the true latent variables map to the observed high-dimensional data via a polynomial function, then representation learning via minimizing the standard reconstruction loss of autoencoders identifies the true latents up to affine transformation.
NashAE: Disentangling Representations through Adversarial Covariance Minimization. [paper] [code]
- Eric Yeats, Frank Liu, David Womble, Hai Li. ECCV 2022
- Key Word: Disentangled Representation.
- Digest
  We present a self-supervised method to disentangle factors of variation in high-dimensional data that does not rely on prior knowledge of the underlying variation profile (e.g., no assumptions on the number or distribution of the individual latent variables to be extracted). In this method which we call NashAE, high-dimensional feature disentanglement is accomplished in the low-dimensional latent space of a standard autoencoder (AE) by promoting the discrepancy between each encoding element and information of the element recovered from all other encoding elements.
A Survey of Deep Causal Model. [paper]
- Zongyu Li, Zhenfeng Zhu.
- Key Word: Causality; Survey.
- Digest
  This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.
Introspective Learning : A Two-Stage Approach for Inference in Neural Networks. [paper] [code]
- Mohit Prabhushankar, Ghassan AlRegib. NeurIPS 2022
- Key Word: Active Learning; Out-of-Distribution Detection; Uncertainty Estimation; Image Quality Assessment.
- Digest
  We advocate for two stages in a neural network's decision making process. The first is the existing feed-forward inference framework where patterns in given data are sensed and associated with previously learned patterns. The second stage is a slower reflection stage where we ask the network to reflect on its feed-forward decision by considering and evaluating all available choices. Together, we term the two stages as introspective learning. We use gradients of trained neural networks as a measurement of this reflection. A simple three-layered Multi Layer Perceptron is used as the second stage that predicts based on all extracted gradient features.
Making Intelligence: Ethics, IQ, and ML Benchmarks. [paper]
- Borhane Blili-Hamelin, Leif Hancox-Li.
- Key Word: Ethics.
- Digest
  The ML community recognizes the importance of anticipating and mitigating the potential negative impacts of benchmark research. In this position paper, we argue that more attention needs to be paid to areas of ethical risk that lie at the technical and scientific core of ML benchmarks. We identify overlooked structural similarities between human IQ and ML benchmarks. Human intelligence and ML benchmarks share similarities in setting standards for describing, evaluating and comparing performance on tasks relevant to intelligence. This enables us to unlock lessons from feminist philosophy of science scholarship that need to be considered by the ML benchmark community. Finally, we outline practical recommendations for benchmark research ethics and ethics review.
Fundamentals of Task-Agnostic Data Valuation. [paper]
- Mohammad Mohammadi Amiri, Frederic Berdoz, Ramesh Raskar.
- Key Word: Task-Agnostic Data Valuation.
- Digest
  We focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer.
Language Models (Mostly) Know What They Know. [paper]
- Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan.
- Key Word: Language Models; Calibration.
- Digest
  We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks.
Repairing Neural Networks by Leaving the Right Past Behind. [paper]
- Ryutaro Tanno, Melanie F. Pradier, Aditya Nori, Yingzhen Li.
- Key Word: Bayesian Continual Unlearning; Model Repairment.
- Digest
  This work draws on the Bayesian view of continual learning, and develops a generic framework for both, identifying training examples that have given rise to the target failure, and fixing the model through erasing information about them. This framework naturally allows leveraging recent advances in continual learning to this new problem of model repairment, while subsuming the existing works on influence functions and data deletion as specific instances. Experimentally, the proposed approach outperforms the baselines for both identification of detrimental training data and fixing model failures in a generalisable manner.
Mechanisms that Incentivize Data Sharing in Federated Learning. [paper]
- Sai Praneeth Karimireddy, Wenshuo Guo, Michael I. Jordan.
- Key Word: Data Maximization Incentivization; Federated Learning; Contract Theory.
- Digest
  Federated learning is typically considered a beneficial technology which allows multiple agents to collaborate with each other, improve the accuracy of their models, and solve problems which are otherwise too data-intensive / expensive to be solved individually. However, under the expectation that other agents will share their data, rational agents may be tempted to engage in detrimental behavior such as free-riding where they contribute no data but still enjoy an improved model. In this work, we propose a framework to analyze the behavior of such rational data generators.
On the Need and Applicability of Causality for Fair Machine Learning. [paper]
- Rūta Binkytė, Sami Zhioua.
- Key Word: Causality; Fairness.
- Digest
  Causal reasoning has an indispensable role in how humans make sense of the world and come to decisions in everyday life. While 20th century science was reserved from making causal claims as too strong and not achievable, the 21st century is marked by the return of causality encouraged by the mathematization of causal notions and the introduction of the non-deterministic concept of cause. Besides its common use cases in epidemiology, political, and social sciences, causality turns out to be crucial in evaluating the fairness of automated decisions, both in a legal and everyday sense. We provide arguments and examples of why causality is particularly important for fairness evaluation.
Robustness of Epinets against Distributional Shifts. [paper]
- Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, Benjamin Van Roy.
- Key Word: Epinets; Uncertainty; Distribution Shifts.
- Digest
  Recent work introduced the epinet as a new approach to uncertainty modeling in deep learning. An epinet is a small neural network added to traditional neural networks, which, together, can produce predictive distributions. In particular, using an epinet can greatly improve the quality of joint predictions across multiple inputs, a measure of how well a neural network knows what it does not know. In this paper, we examine whether epinets can offer similar advantages under distributional shifts. We find that, across ImageNet-A/O/C, epinets generally improve robustness metrics.
Causal Machine Learning: A Survey and Open Problems. [paper]
- Jean Kaddour, Aengus Lynch, Qi Liu, Matt J. Kusner, Ricardo Silva.
- Key Word: Causality; Survey.
- Digest
  Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This allows one to reason about the effects of changes to this process (i.e., interventions) and what would have happened in hindsight (i.e., counterfactuals). We categorize work in \causalml into five groups according to the problems they tackle: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, (5) causal reinforcement learning.
Can Foundation Models Talk Causality? [paper]
- Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting.
- Key Word: Foundation Models; Causality.
- Digest
  Foundation models are subject to an ongoing heated debate, leaving open the question of progress towards AGI and dividing the community into two camps: the ones who see the arguably impressive results as evidence to the scaling hypothesis, and the others who are worried about the lack of interpretability and reasoning capabilities. By investigating to which extent causal representations might be captured by these large scale language models, we make a humble efforts towards resolving the ongoing philosophical conflicts.
X-Risk Analysis for AI Research. [paper]
- Dan Hendrycks, Mantas Mazeika.
- Key Word: AI Risk.
- Digest
  Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind the potential benefits of AI, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we provide a guide for how to analyze AI x-risk.
BaCaDI: Bayesian Causal Discovery with Unknown Interventions. [paper]
- Alexander Hägele, Jonas Rothfuss, Lars Lorch, Vignesh Ram Somnath, Bernhard Schölkopf, Andreas Krause.
- Key Word: Causal Discovery.
- Digest
  Learning causal structures from observation and experimentation is a central task in many domains. For example, in biology, recent advances allow us to obtain single-cell expression data under multiple interventions such as drugs or gene knockouts. However, a key challenge is that often the targets of the interventions are uncertain or unknown. Thus, standard causal discovery methods can no longer be used. To fill this gap, we propose a Bayesian framework (BaCaDI) for discovering the causal structure that underlies data generated under various unknown experimental/interventional conditions.
Differentiable Invariant Causal Discovery. [paper]
- Yu Wang, An Zhang, Xiang Wang, Xiangnan He, Tat-Seng Chua.
- Key Word: Causal Discovery.
- Digest
  This paper proposes Differentiable Invariant Causal Discovery (DICD), utilizing the multi-environment information based on a differentiable framework to avoid learning spurious edges and wrong causal directions. Specifically, DICD aims to discover the environment-invariant causation while removing the environment-dependent correlation. We further formulate the constraint that enforces the target structure equation model to maintain optimal across the environments.
AI and Ethics -- Operationalising Responsible AI. [paper]
- Liming Zhu, Xiwei Xu, Qinghua Lu, Guido Governatori, Jon Whittle.
- Key Word: Survey; Ethics; Responsibility.
- Digest
  In the last few years, AI continues demonstrating its positive impact on society while sometimes with ethically questionable consequences. Building and maintaining public trust in AI has been identified as the key to successful and sustainable innovation. This chapter discusses the challenges related to operationalizing ethical AI principles and presents an integrated view that covers high-level ethical AI principles, the general notion of trust/trustworthiness, and product/process support in the context of responsible AI, which helps improve both trust and trustworthiness of AI for a wider set of stakeholders.
State of AI Ethics Report (Volume 6, February 2022). [paper]
- Abhishek Gupta, Connor Wright, Marianna Bergamaschi Ganapini, Masa Sweidan, Renjie Butalid.
- Key Word: Report; Ethics.
- Digest
  This report from the Montreal AI Ethics Institute (MAIEI) covers the most salient progress in research and reporting over the second half of 2021 in the field of AI ethics. Particular emphasis is placed on an "Analysis of the AI Ecosystem", "Privacy", "Bias", "Social Media and Problematic Information", "AI Design and Governance", "Laws and Regulations", "Trends", and other areas covered in the "Outside the Boxes" section. The two AI spotlights feature application pieces on "Constructing and Deconstructing Gender with AI-Generated Art" as well as "Will an Artificial Intellichef be Cooking Your Next Meal at a Michelin Star Restaurant?".
Optimal transport for causal discovery. [paper]
- Ruibo Tu, Kun Zhang, Hedvig Kjellström, Cheng Zhang. ICLR 2022
- Key Word: Causal Discovery; Optimal Transport.
- Digest
  To determine causal relationships between two variables, approaches based on Functional Causal Models (FCMs) have been proposed by properly restricting model classes; however, the performance is sensitive to the model assumptions, which makes it difficult to use. In this paper, we provide a novel dynamical-system view of FCMs and propose a new framework for identifying causal direction in the bivariate case. We first show the connection between FCMs and optimal transport, and then study optimal transport under the constraints of FCMs.

Related Awesome Lists

Robustness Lists

Privacy Lists

Fairness Lists

Interpretability Lists

Open-World Lists

Blockchain Lists

Other Lists

Toolboxes

Robustness Toolboxes

Cleverhans
- This repository contains the source code for CleverHans, a Python library to benchmark machine learning systems' vulnerability to adversarial examples.
Adversarial Robustness Toolbox (ART)
- Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference.
Advtorch
- Advtorch is a Python toolbox for adversarial robustness research. The primary functionalities are implemented in PyTorch. Specifically, AdverTorch contains modules for generating adversarial perturbations and defending against adversarial examples, also scripts for adversarial training.
RobustBench
- A standardized benchmark for adversarial robustness.
DeepDG: OOD generalization toolbox
- A domain generalization toolbox for research purpose.

Privacy Toolboxes

Diffprivlib
- Diffprivlib is a general-purpose library for experimenting with, investigating and developing applications in, differential privacy.
Privacy Meter
- Privacy Meter is an open-source library to audit data privacy in statistical and machine learning algorithms.
PrivacyRaven
- PrivacyRaven is a privacy testing library for deep learning systems.
PersonalizedFL
- PersonalizedFL is a toolbox for personalized federated learning.

Fairness Toolboxes

AI Fairness 360
- The AI Fairness 360 toolkit is an extensible open-source library containing techniques developed by the research community to help detect and mitigate bias in machine learning models throughout the AI application lifecycle.
Fairlearn
- Fairlearn is a Python package that empowers developers of artificial intelligence (AI) systems to assess their system's fairness and mitigate any observed unfairness issues.
Aequitas
- Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive tools.
FAT Forensics
- FAT Forensics implements the state of the art fairness, accountability and transparency (FAT) algorithms for the three main components of any data modelling pipeline: data (raw data and features), predictive models and model predictions.

Interpretability Toolboxes

Lime
- This project is about explaining what machine learning classifiers (or models) are doing.
InterpretML
- InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof.
Deep Visualization Toolbox
- This is the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization.
Captum
- Captum is a model interpretability and understanding library for PyTorch.
Alibi
- Alibi is an open source Python library aimed at machine learning model inspection and interpretation.
AI Explainability 360
- The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models.

Other Toolboxes

Causal Inference 360
- A Python package for inferring causal effects from observational data.

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
FULL_LIST.md		FULL_LIST.md
LICENSE		LICENSE
README.md		README.md

License

fighting41love/awesome-trustworthy-deep-learning

Folders and files

Latest commit

History

Repository files navigation

Awesome Trustworthy Deep Learning

Table of Contents

Paper List

Survey

Out-of-Distribution Generalization

Evasion Attacks and Defenses

Poisoning Attacks and Defenses

Privacy

Fairness

Interpretability

Open-World Learning

Environmental Well-being

Interactions with Blockchain

Others

Related Awesome Lists

Robustness Lists

Privacy Lists

Fairness Lists

Interpretability Lists

Open-World Lists

Blockchain Lists

Other Lists

Toolboxes

Robustness Toolboxes

Privacy Toolboxes

Fairness Toolboxes

Interpretability Toolboxes

Other Toolboxes

Workshops

Robustness Workshops

Privacy Workshops

Interpretability Workshops

Other Workshops

Tutorials

Robustness Tutorials

Talks

Robustness Talks

Blogs

Robustness Blogs

Interpretability Blogs

Other Blogs

Other Resources

Contributing

About

Resources

License

Stars

Watchers

Forks