List of Papers on AI + Security
Keywords:
- Adversarial Machine Learning
- Attack & Defense
- Robustness & Interpretability
- Privacy & Watermark
- Federated learning
- Semantic-Preserving Adversarial Text Attacks. Xinghao Yang, Weifeng Liu, James Bailey, Tianqing Zhu, Dacheng Tao, Wei Liu. [PDF]
- Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble. Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-wei Chang, Xuanjing Huang. ACL 2021. [PDF]
- Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger. Fanchao Qi, Mukai Li, Yangyi Chen, Zhengyan Zhang, Zhiyuan Liu, Yasheng Wang, Maosong Sun. ACL 2021. [PDF]
- Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models. Jieyu Lin, Jiajie Zou and Nai Ding. ACL 2021. [PDF]
- An Empirical Study on Adversarial Attack on NMT: Languages and Positions Matter. Zhiyuan Zeng and Deyi Xiong. ACL 2021. [PDF]
- A Sweet Rabbit Hole by DARCY: Using Honeypots to Detect Universal Trigger’s Adversarial Attacks. Thai Le, Noseong Park and Dongwon Lee. ACL 2021. [PDF]
- Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution. Fanchao Qi, Yuan Yao, Sophia Xu, Zhiyuan Liu and Maosong Sun. ACL 2021. [PDF]
- Rethinking Stealthiness of Backdoor Attack against NLP Models. Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou and Xu Sun. ACL 2021. [PDF]
- Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions. Daniel Rosenberg, Itai Gat, Amir Feder and Roi Reichart. ACL 2021. [PDF] [CODE]
- CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding. Dong Wang, Ning Ding, Piji Li, Hai-Tao Zheng. ACL 2021. [PDF]
- Differential Privacy for Text Analytics via Natural Text Sanitization. Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow. ACL 2021. [PDF]
- Generating Fluent Adversarial Examples for Natural Languages. Huangzhao Zhang, Hao Zhou, Ning Miao, Lei Li. ACL 2019. [PDF]
- A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning. Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzmán, Benjamin I. P. Rubinstein. The Web Conference 2021. [PDF]
- Generating Natural Language Attacks in a Hard Label Black Box Setting. Rishabh Maheshwary, Saket Maheshwary, Vikram Pudi. AAAI 2021. [PDF]
- Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh. ACL 2020. [PDF]
- Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search. Xinghao Yang, Weifeng Liu, James Bailey, Dacheng Tao, Wei Liu. AAAI 2021. [PDF]
- Argot: Generating Adversarial Readable Chinese Texts. Zihan Zhang, Mingxuan Liu, Chao Zhang, Yiming Zhang, Zhou Li, Qi Li, Haixin Duan, Donghong Sun. IJCAI 2020. [PDF]
- Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che. ACL 2019. [PDF]
- Greedy Attack and Gumbel Attack:Generating Adversarial Examples for Discrete Data. Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael I. Jordan. J MACH LEARN RES 2020. [PDF]
- Humpty Dumpty:Controlling Word Meanings via Corpus Poisoning. Roei Schuster, Tal Schuster, Yoav Meri, Vitaly Shmatikov. SP 2020. [PDF]
- Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. AAAI 2020. [PDF]
- On the Robustness of Language Encoders against Grammatical Errors. Fan Yin, Quanyu Long, Tao Meng, Kai-Wei Chang. ACL 2020. [PDF]
- Structure-Invariant Testing for Machine Translation. Pinjia He, Clara Meister, Zhendong Su. ICSE 2020. [PDF]
- Attackability Characterization of Adversarial Evasion Attack on Discrete Data. Wang Yutong, Han Yufei, Bao Hongyan, Shen Yun, Ma Fenglong, Li Jin, Zhang Xiangliang. SIGKDD 2020. [PDF]
- Word-level Textual Adversarial Attacking as Combinatorial Optimization. Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, Maosong Sun. ACL 2020. [PDF]
- Bayesian Attention Belief Networks. Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou. ICML 2021. [PDF][theory]
- Fast and Precise Certification of Transformers. Gregory Bonaert, Dimitar I. Dimitrov, Maximilian Baader, Martin Vechev. PLDI 2021. [PDF][theory]
- A Robust Adversarial Training Approach to Machine Reading Comprehension. Kai Liu, Xin Liu, An Yang, Jing Liu, Jinsong Su, Sujian Li, Qiaoqiao She. PLDI 2021. [PDF]
- SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions. Mao Ye, Chengyue Gong, Qiang Liu. ACL 2020. [PDF]
- Combating Adversarial Misspellings with Robust Word Recognition. Danish Pruthi, Bhuwan Dhingra, Zachary C. Lipton. ACL 2019. [PDF]
- Joint Character-Level Word Embedding and Adversarial Stability Training to Defend Adversarial Text. Hui Liu, Yongzheng Zhang, Yipeng Wang, Zheng Lin, Yige Chen. AAAI 2020. [PDF]
- Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification. Xin Dong, Yaxin Zhu, Yupeng Zhang, Zuohui Fu, Dongkuan Xu, Sen Yang, Gerard de Melo. SIGIR 2020. [PDF]
- NAT: Noise-Aware Training for Robust Neural Sequence Labeling. Marcin Namysl, Sven Behnke, Joachim Köhler. ACL 2020. [PDF]
- Pretrained Transformers Improve Out-of-Distribution Robustness. Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song. ACL 2020. [PDF]
- Robust Encodings: A Framework for Combating Adversarial Typos. Erik Jones, Robin Jia, Aditi Raghunathan, Percy Liang. ACL 2020. [PDF]
- Robust Neural Machine Translation with Doubly Adversarial Inputs. Yong Cheng, Lu Jiang, Wolfgang Macherey. ACL 2019. [PDF]
- TEXTSHIELD: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation. Jinfeng Li, Tianyu Du, Shouling Ji, Rong Zhang, Quan Lu, Min Yang, Ting Wang. USENIX 2020. [PDF]
- Syntactic Data Augmentation Increases Robustness to Inference Heuristics. Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, Tal Linzen. ACL 2020. [PDF]
- Generating Adversarial Examples for Holding Robustness of Source Code Processing Models. Huangzhao Zhang, Zhuo Li, Ge Li, Lei Ma, Yang Liu, Zhi Jin. AAAI 2020. [PDF]
- Attention Please: Your Attention Check Questions in Survey Studies Can Be Automatically Answered. Weiping Pei , Arthur Mayer, Kaylynn Tu, Chuan Yue. The Web Conference 2020. [PDF]
- BERT & Family Eat Word Salad: Experiments with Text Understanding. Ashim Gupta, Giorgi Kvernadze, Vivek Srikumar. AAAI 2021. [PDF]
- Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples. Xiaoqing Zheng, Jiehang Zeng, Yi Zhou, Cho-Jui Hsieh, Minhao Cheng, Xuanjing Huang. ACL 2020. [PDF]
- Imitation Attacks and Defenses for Black-box Machine Translation Systems. Eric Wallace, Mitchell Stern, Dawn Song. EMNLP 2020. [PDF]
- Improving the Robustness of Question Answering Systems to Question Paraphrasing. Wee Chung Gan , Hwee Tou Ng. ACL 2019. [PDF]
- Crafting Adversarial Examples for Neural Machine Translation. Xinze Zhang, Junzhe Zhang, Zhenhua Chen, Kun He. ACL 2021. [PDF]
- Adversarial Training with Fast Gradient Projection Method against Synonym Substitution Based Text Attacks. Xiaosen Wang, Yichen Yang, Yihe Deng, Kun He. AAAI 2021. [PDF]
- Selective Differential Privacy for Language Modeling. Weiyan Shi, Aiqi Cui, Evan Li, Ruoxi Jia, Zhou Yu. arXiv:2108.12944. [PDF]
- Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution. Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, Cho-Jui Hsieh. EMNLP 2021. [PDF]
- DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks. Shiwen Ni, Jiawen Li, Hung-Yu Kao. arXiv:2108.12805. [PDF]
- CAPE: Context-Aware Private Embeddings for Private Language Learning. Richard Plant, Dimitra Gkatzia, Valerio Giuffrida. EMNLP 2021. [PDF]
- Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning. Linyang Li, Demin Song, Xiaonan Li, Jiehang Zeng, Ruotian Ma, Xipeng Qiu. EMNLP 2021. [PDF]
- Beyond Model Extraction: Imitation Attack for Black-Box NLP APIs. Qiongkai Xu, Xuanli He, Lingjuan Lyu, Lizhen Qu, Gholamreza Haffari. arXiv:2108.13873. [PDF]
- Efficient Combinatorial Optimization for Word-level Adversarial Textual Attack. Shengcai Liu, Ning Lu, Cheng Chen, Ke Tang. arXiv:2109.02229. [PDF]
- Training Meta-Surrogate Model for Transferable Adversarial Attack. Yunxiao Qin, Yuanhao Xiong, Jinfeng Yi, Cho-Jui Hsieh. arXiv:2109.01983. [PDF]
- Practical and Secure Federated Recommendation with Personalized Masks. Liu Yang, Ben Tan, Bo Liu, Vincent W. Zheng, Kai Chen, Qiang Yang. arXiv:2109.02464. [PDF]
- Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction. Zhenrui Yue, Zhankui He, Huimin Zeng, Julian McAuley. RECSYS 2021. [PDF]
- Membership Inference Attacks Against Recommender Systems. Minxing Zhang, Zhaochun Ren, Zihan Wang, Pengjie Ren, Zhumin Chen, Pengfei Hu, Yang Zhang. 2021. [PDF]
- Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP. Timo Schick, Sahana Udupa, Hinrich Schütze. TACL 2021. [PDF]
- A Strong Baseline for Query Efficient Attacks in a Black Box Setting. Rishabh Maheshwary, Saket Maheshwary, Vikram Pudi. EMNLP 2021. [PDF]
- Adversarial Examples for Evaluating Math Word Problem Solvers. Vivek Kumar, Rishabh Maheshwary, Vikram Pudi. EMNLP 2021. [PDF]
- Large Language Models Can Be Strong Differentially Private Learners. Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto. arxiv 2021. [PDF]
- Text Detoxification using Large Pre-trained Neural Models. David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko. EMNLP 2021. [PDF]
- Gradient-based Adversarial Attacks against Text Transformers. Chuan Guo, Alexandre Sablayrolles, Hervé Jégou and Douwe Kiela. EMNLP 2021. [PDF]
- ONION: A Simple and Effective Defense Against Textual. Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu and Maosong Sun. EMNLP 2021. [PDF]
- Learning to Ignore Adversarial Attacks. [PDF]
- FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness. [PDF]
- GradMask: Gradient-Guided Token Masking for Textual Adversarial Example Detection. [PDF]
- Flooding-X: Improving BERT's Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning. [PDF]
- Robust and Effective Grammatical Error Correction with Simple Cycle Self-Augmenting. [PDF]
- Perturbation-based Self-supervised Attention for Text Classification.[PDF]
- Detection of Adversarial Examples in NLP: Benchmark and Baseline via Robust Density Estimation.[PDF]
- Probing the Robustness of Trained Metrics for Conversational Dialogue Systems. [PDF]
- On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark. [PDF]
- BufferSearch: Generating Black-Box Adversarial Texts With Lower Queries. [PDF]
- Generating Authentic Adversarial Examples beyond Meaning-preserving with Doubly Round-trip Translation. [PDF]
- Removing Out-of-Distribution Data Improves Adversarial Robustness. [PDF]
- Evaluating the Robustness of Neural Language Models to Input Perturbations. Richard Plant, Dimitra Gkatzia, Valerio Giuffrida. EMNLP 2021. [PDF]
- Discretized Integrated Gradients for Explaining Language Models. Soumya Sanyal, Xiang Ren. EMNLP 2021. [PDF]
- Achieving Model Robustness through Discrete Adversarial Training. Maor Ivgi and Jonathan Berant. EMNLP 2021. [PDF]
- Adversarial Preprocessing: Understanding and Preventing Image-Scaling Attacks in Machine Learning. Erwin Quiring, David Klein, Daniel Arp, Martin Johns, Konrad Rieck, TU Braunschweig. USENIX 2020. [PDF]
- Amora: Black-box Adversarial Morphing Attack. Run Wang, Felix Juefei-Xu, Qing Guo, Yihao Huang, Xiaofei Xie, Lei Ma, Yang Liu. ACMMM 2020. [PDF]
- Learning Ordered Top-k Adversarial Attacks via Adversarial Distillation. Zekun Zhang, Tianfu Wu. CVPR 2020. [PDF]
- Towards Feature Space Adversarial Attack by Style Perturbation. Qiuling Xu, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang. AAAI 2021. [PDF]
- Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications. Chenglong Wang, Rudy Bunel, Krishnamurthy Dvijotham, Po-Sen Huang, Edward Grefenstette, Pushmeet Kohli. CVPR 2019. [PDF]
- Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search. Abhimanyu Dubey, Laurens van der Maaten, Zeki Yalniz, Yixuan Li, Dhruv Mahajan. CVPR 2019. [PDF]
- Evading Deepfake-Image Detectors with White- and Black-Box Attacks. Nicholas Carlini, Hany Farid. CVPR 2020. [PDF]
- Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors. Gilad Cohen, Guillermo Sapiro, Raja Giryes. CVPR 2020. [PDF]
- LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition. Valeriia Cherepanova, Micah Goldblum, Harrison Foley, Shiyuan Duan, John P Dickerson, Gavin Taylor, Tom Goldstein. ICLR 2021. [PDF]
- NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, Xiangyu Zhang. NDSS 2019. [PDF]
- Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks. Shawn Shan, Emily Wenger, Bolun Wang, Bo Li, Haitao Zheng, Ben Y. Zhao. CCS 2020. [PDF]
- Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries. Fnu Suya, Jianfeng Chi, David Evans, Yuan Tian. USENIX 2020. [PDF]
- Stealthy Adversarial Perturbations Against Real-Time Video Classification Systems. Shasha Li, Ajaya Neupane, Sujoy Paul, Chengyu Song, Srikanth V. Krishnamurthy, Amit K. Roy Chowdhury and Ananthram Swami. NDSS 2019. [PDF]
- Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models. Zeyuan Wang, Chaofeng Sha, Su Yang. IJCAI 2021. [PDF]
- TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning. Shu Hu, Lipeng Ke, Xin Wang, Siwei Lyu. ICCV 2021. [PDF]
- CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, Yier Jin. NDSS 2020. [PDF]
- On Improving Adversarial Transferability of Vision Transformers. Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli. arXiv:2106.04169. [PDF]
- On the Robustness of Vision Transformers to Adversarial Examples. Kaleel Mahmood, Rigel Mahmood, Marten van Dijk. arXiv:2104.02610. [PDF]
- Towards Transferable Adversarial Attacks on Vision Transformers. Zhipeng Wei etc.. arXiv:2109.04176. [PDF]
- Evading defenses to transferable adversarial examples by translation-invariant attacks. Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu. CVPR 2019. [PDF]
- Black-box adversarial attack with transferable model-based embedding. Zhichao Huang, Tong Zhang. ICLR 2020. [PDF]
- Feature space perturbations yield more transferable adversarial examples. Nathan Inkawhich, Wei Wen, Hai (Helen) Li, Yiran Chen. CVPR 2019. [PDF]
- Towards transferable targeted attack. Maosen Li, Cheng Deng, Tengjiao Li, Junchi Yan, Xinbo Gao, Heng Huang. CVPR 2020. [PDF]
- Learning transferable adversarial examples via ghost networks. Yingwei Li, Song Bai, Yuyin Zhou, Cihang Xie, Zhishuai Zhang, Alan Yuille. AAAI-2020. [PDF]
- Adv-Makeup: A New Imperceptible and Transferable Attack on Face Recognition. Bangjie Yin, Wenxuan Wang2, Taiping Yao, Junfeng Guo, Zelun Kong, Shouhong Ding, Jilin Li, Cong Liu. arXiv:2105.03162. [PDF]
- Boosting the transferability of adversarial samples via attention. Weibin Wu, Yuxin Su, Xixian Chen, Shenglin Zhao, Irwin King, Michael R. Lyu, Yu-Wing Tai. CVPR 2020. [PDF]
- Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness. Xiao Yang1∗ Yinpeng Dong1∗ Wenzhao Xiang, Tianyu Pang, Hang Su, Jun Zhu. arXiv:2110.08256 [cs]. [PDF]
- SirenAtack: Generating Adversarial Audio for End-to-End Acoustic Systems. Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, Raheem Beyah. ASIA CCS 2020. [PDF]
- Adversarial Music: Real World Audio Adversary Against Wake-word Detection System. Juncheng B. Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze. NeurIPS 2019. [PDF]
- SEC4SR: A Security Analysis Platform for Speaker Recognition. Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu. arXiv:2109.01766. [PDF]
- ICCV 2021 2nd Workshop on Adversarial Robustness In the Real World
- Uncertainty & Robustness in Deep Learning (Workshop at ICML 2021)
- Security and Safety in Machine Learning Systems (Workshop at ICLR 2021)
- Generalization beyond the Training Distribution in Brains and Machines (Workshop at ICLR 2021)
- 1st International Workshop on Adversarial Learning for Multimedia (Workshop at ACM Multimedia 2021)
- Workshop on Adversarial Machine Learning in Real-World Computer Vision Systems and Online Challenges (Workshop at CVPR 2021)
- ExplainaBoard: An Explainable Leaderboard for NLP: https://github.com/neulab/ExplainaBoard
- TextFlint: https://github.com/textflint/textflint
- OpenAttack: https://github.com/thunlp/OpenAttack
- SeetaFace2: https://github.com/seetafaceengine/SeetaFace2
- Thermostat:https://arxiv.org/abs/2108.13961
- alipy:https://github.com/NUAA-AL/ALiPy
- Baoyuan Wu: Secure Computing Lab of Big Data http://scl.sribd.cn/index.html
- Xingjun Ma: http://xingjunma.com/
- Yang Liu: https://personal.ntu.edu.sg/yangliu/
- Cihang Xie: https://cihangxie.github.io/
- Siwei Lyu: https://cse.buffalo.edu/~siweilyu/
- Yisen Wang: https://yisenwang.github.io/
- Sijia Liu: https://lsjxjtu.github.io/
- Xianglong Liu: https://scholar.google.com/citations?user=8VY7ZDcAAAAJ&hl=en
- Bo Li: https://aisecure.github.io/
- Quanshi Zhang:
- Xiaochun Cao: