✨ Please check out our survey paper https://arxiv.org/abs/2310.08184
- Learn From Model Beyond Fine-Tuning: A Survey
-
[mikecaptain] Improving language understanding by generative pretraining
-
[arXiv] Better fine-tuning by reducing representational collapse
-
[ACM Computing Surveys] Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing
-
[arXiv] The power of scale for parameter-efficient prompt tuning
-
[ACM Computing Surveys] Recent advances in natural language processing via large pre-trained language models: A survey
-
[TMI] Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging
-
[ACL] Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models
-
[CVPR] Robust fine-tuning of zero-shot models
-
[arXiv] Better fine-tuning by reducing representational collapse
-
[arXiv] Fine-tuning can distort pretrained features and underperform out-of-distribution
-
[CVPR] Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
-
[arXiv] Knowledge is a Region in Weight Space for Fine-tuned Language Models
-
[Nips] Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning
-
[AAAI] On the effectiveness of parameter-efficient fine-tuning
-
⭐ [nature] Parameter-efficient fine-tuning of large-scale pre-trained language models
-
[arXiv] Lightweight adapter tuning for multilingual speech translation
-
[arXiv] Multi-head adapter routing for cross-task generalization
-
[arXiv] Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models
-
[arXiv] Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models
-
[arXiv] Adapterfusion: Non-destructive task composition for transfer learning
-
[arXiv] On the effectiveness of adapter-based tuning for pretrained language model adaptation
-
[arXiv] Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models
-
[SLT] Exploring efficient-tuning methods in self-supervised speech models
-
[ICASSP] Using adapters to overcome catastrophic forgetting in end-to-end automatic speech recognition
-
[arXiv] Learning to Prompt for Vision-Language Model
-
[arXiv] Prefix-tuning: Optimizing continuous prompts for generation
-
[ICLR] Progressive prompts: Continual learning for language models
-
[arXiv] Rlprompt: Optimizing discrete text prompts with reinforcement learning
-
[ICML] Black-Box Tuning for Language-Model-as-a-Service (BBTv1)
-
[EMNLP] BBTv2: Towards a gradient-free future with large language models
-
[arXiv] Gradient-regulated meta-prompt learning for generalizable vision-language models
-
[arXiv] Adversarial Prompting for Black Box Foundation Models
-
[ACM Computing Surveys] Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
-
[arXiv] Scalable Prompt Generation for Semi-supervised Learning with Language Models
-
[arXiv] Dynamic Prompting: A Unified Framework for Prompt Tuning
-
[arXiv] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
-
[ICML] PromptBoosting: Black-Box Text Classification with Ten Forward Passes
-
[Arxiv] On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
-
[arXiv] Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery
-
[arXiv] Rethinking Efficient Tuning Methods from a Unified Perspective
-
[arXiv] Model-tuning Via Prompts Makes NLP Models Adversarially Robust
-
[arXiv] UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
-
[CVPR] Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
-
[COLING] Linguist: Language model instruction tuning to generate annotated utterances for intent classification and slot tagging
-
[arXiv] Visual Instruction Tuning
-
[arXiv] Gpt4roi: Instruction tuning large language model on region-of-interest
-
[ICML] The flan collection: Designing data and methods for effective instruction tuning
-
[arXiv] Instruction tuning with gpt-4
-
[Nips] Training language models to follow instructions with human feedback
-
[arXiv] Exploring the benefits of training expert language models over instruction tuning
-
[arXiv] Otter: A Multi-Modal Model with In-Context Instruction Tuning
-
[arXiv] Augmented Language Models: a Survey
-
[arXiv] Improving neural language models with a continuous cache
-
[arXiv] Generalization through memorization: Nearest neighbor language models
-
[Nips] Retrieval-augmented generation for knowledge-intensive nlp tasks
-
[arXiv] Few-shot learning with retrieval augmented language models
-
[arXiv] Replug: Retrieval-augmented black-box language models
- Loss function (retrieves KL divergence between likelihood and language model likelihood) :
$$\mathcal{L}=\frac{1}{|\mathcal{B}|} \sum_{x \in \mathcal{B}} K L\left(P_R(d \mid x) | Q_{\mathrm{LM}}(d \mid x, y)\right)$$
- Loss function (retrieves KL divergence between likelihood and language model likelihood) :
-
[arXiv] Murag: Multimodal retrieval-augmented generator for open question answering over images and text
-
[arXiv] Re-Imagen: Retrieval-Augmented Text-to-Image Generator
-
[openreview] Retrieval-Augmented Multimodal Language Modeling
-
[arXiv] A Survey on Retrieval-Augmented Text Generation
-
[IJCV] Knowledge distillation: A survey
- [arXiv] Data-Free Knowledge Transfer: A Survey
-
[arXiv] Data-free knowledge distillation for deep neural networks
-
[CVPR] Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
-
[arXiv] Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!
-
[arXiv] The Life Cycle of Knowledge in Big Language Models: A Survey
-
[arXiv] Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
-
[arXiv] Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data
-
[CVPR] Generic-to-Specific Distillation of Masked Autoencoders
- [arXiv] Deep Classifier Mimicry without Data Access
-
[Multiple Classifier Systems] Ensemble methods in machine learning
-
⭐[arXiv] BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
-
⭐[arXiv] Tangent Model Composition for Ensembling and Continual Fine-tuning
-
⭐ [ICML] Linear Mode Connectivity and the Lottery Ticket Hypothesis
-
[openreview] convexity and linear mode connectivity in neural networks
-
⭐ [ICML] Model soups: averaging weights of multiple finetuned models improves accuracy without increasing inference time
-
[arXiv] Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
-
[arXiv] Understanding the Effectiveness of Early Weight Averaging for Training Large Language Models
-
[ICLR] Editing models with task arithmetic
-
⭐ [ICML] Linear Mode Connectivity and the Lottery Ticket Hypothesis
-
[openreview] On convexity and linear mode connectivity in neural networks
-
⭐ [arXiv] Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
-
[arXiv] Git Re-Basin: Merging Models modulo Permutation Symmetries
-
[arXiv] George Stoica et al. ZipIt! Merging Models from Different Tasks without Training
-
[arXiv] Dataless Knowledge Fusion by Merging Weights of Language Models
-
[Nips] Merging Models with Fisher-Weighted Averaging
-
[arXiv]AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
-
[TCSVT] Progressive meta-learning with curriculum
-
[CVPR] Metafscil: A meta-learning approach for few-shot class incremental learning.
-
[ICML] Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling
-
[UAI] Meta-learning without data via Wasserstein distributionally-robust model fusion
-
[CVPR] Meta-Learning for Multi-Label Few-Shot Classification
-
[ECCV] Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions
-
[CVPR] Learning To Learn and Remember Super Long Multi-Domain Task Sequence
-
[ICSE] Cross-domain deep code search with meta learning
-
[arXiv] Learning to Learn from APIs: Black-Box Data-Free Meta-Learning
-
[CVPR] Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning
-
[arXiv] FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?
-
[AAAI] Training Meta-Surrogate Model for Transferable Adversarial Attack
-
[SP] D-DAE: Defense-Penetrating Model Extraction Attacks
-
[Neurocomputing] MGML: Momentum group meta-learning for few-shot image classification
-
[ICRA] Meta-Learning-Based Optimal Control for Soft Robotic Manipulators to Interact with Unknown Environments
-
[arXiv] Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator
-
[Neuromorphic Computing and Engineering] Meta-learning spiking neural networks with surrogate gradient descent
-
[PMLR] The Role of Deconfounding in Meta-learning
-
[ITSP] Distributed Reptile Algorithm for Meta-Learning Over Multi-Agent Systems
-
[Nips] Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations
-
[Computers & Graphics] An overview on Meta-learning approaches for Few-shot Weakly-supervised Segmentation
- [ICRA] Editing Large Language Models: Problems, Methods, and Opportunities
- [EMNLP] Memory-assisted prompt editing to improve GPT-3 after deployment
-
[arXiv] Transformer-Patcher: One Mistake worth One Neuron
-
[arXiv] Calibrating Factual Knowledge in Pretrained Language Models
-
[arXiv] Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge
-
[arXiv] Fixing Model Bugs with Natural Language Patches
-
[arXiv] Modifying Memories in Transformer Models
-
[arXiv] Mass-Editing Memory in a Transformer
-
[Nips] Locating and Editing Factual Associations in GPT
-
[arXiv] Rank-One Editing of Encoder-Decoder Models
-
[arXiv] Prompt-Based Editing for Text Style Transfer
-
[CVPR]Conditional Text Image Generation With Diffusion Models
-
[arXiv] Crawling the Internal Knowledge-Base of Language Models
-
[arXiv] The Life Cycle of Knowledge in Big Language Models: A Survey
If you find this repository useful, please consider citing this paper:
@article{zheng2023learn,
title={Learn From Model Beyond Fine-Tuning: A Survey},
author={Zheng, Hongling and Shen, Li and Tang, Anke and Luo, Yong and Hu, Han and Du, Bo and Tao, Dacheng},
journal={arXiv preprint arXiv:2310.08184},
year={2023}
}