Skip to content

ruthless-man/Awesome-Learn-from-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Learn From Model Beyond Fine-Tuning: A suvery

✨ Please check out our survey paper https://arxiv.org/abs/2310.08184

Table of Contents

Papers

Model Tuning

Weight Enginerring

Fine Tuning

  • [mikecaptain] Improving language understanding by generative pretraining

  • [arXiv] Better fine-tuning by reducing representational collapse

  • [ACM Computing Surveys] Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing

  • [arXiv] The power of scale for parameter-efficient prompt tuning

  • [ACM Computing Surveys] Recent advances in natural language processing via large pre-trained language models: A survey

  • [TMI] Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging

  • [ACL] Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models

  • [CVPR] Robust fine-tuning of zero-shot models

  • [arXiv] Better fine-tuning by reducing representational collapse

  • [arXiv] Fine-tuning can distort pretrained features and underperform out-of-distribution

  • [CVPR] Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

  • [arXiv] Knowledge is a Region in Weight Space for Fine-tuned Language Models

  • [Nips] Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning

  • [AAAI] On the effectiveness of parameter-efficient fine-tuning

  • ⭐ [nature] Parameter-efficient fine-tuning of large-scale pre-trained language models

Adapter Tuning

  • [arXiv] Lightweight adapter tuning for multilingual speech translation

  • [arXiv] Multi-head adapter routing for cross-task generalization

  • [arXiv] Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models

  • [arXiv] Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models

  • [arXiv] Adapterfusion: Non-destructive task composition for transfer learning

  • [arXiv] On the effectiveness of adapter-based tuning for pretrained language model adaptation

  • [arXiv] Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models

  • [SLT] Exploring efficient-tuning methods in self-supervised speech models

  • [ICASSP] Using adapters to overcome catastrophic forgetting in end-to-end automatic speech recognition

Input engineering

Prompt Tuning

  • [arXiv] Learning to Prompt for Vision-Language Model

  • [arXiv] Prefix-tuning: Optimizing continuous prompts for generation

  • [ICLR] Progressive prompts: Continual learning for language models

  • [arXiv] Rlprompt: Optimizing discrete text prompts with reinforcement learning

  • [ICML] Black-Box Tuning for Language-Model-as-a-Service (BBTv1)

  • [EMNLP] BBTv2: Towards a gradient-free future with large language models

  • [arXiv] Gradient-regulated meta-prompt learning for generalizable vision-language models

  • [arXiv] Adversarial Prompting for Black Box Foundation Models

  • [ACM Computing Surveys] Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

  • [arXiv] Scalable Prompt Generation for Semi-supervised Learning with Language Models

  • [arXiv] Dynamic Prompting: A Unified Framework for Prompt Tuning

  • [arXiv] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

  • [ICML] PromptBoosting: Black-Box Text Classification with Ten Forward Passes

  • [Arxiv] On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

  • [arXiv] Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery

  • [arXiv] Rethinking Efficient Tuning Methods from a Unified Perspective

  • [arXiv] Model-tuning Via Prompts Makes NLP Models Adversarially Robust

  • [arXiv] UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation

  • [CVPR] Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners

Instruction Tuning

  • [COLING] Linguist: Language model instruction tuning to generate annotated utterances for intent classification and slot tagging

  • [arXiv] Visual Instruction Tuning

  • [arXiv] Gpt4roi: Instruction tuning large language model on region-of-interest

  • [ICML] The flan collection: Designing data and methods for effective instruction tuning

  • [arXiv] Instruction tuning with gpt-4

  • [Nips] Training language models to follow instructions with human feedback

  • [arXiv] Exploring the benefits of training expert language models over instruction tuning

  • [arXiv] Otter: A Multi-Modal Model with In-Context Instruction Tuning

Database Augmentation

  • [arXiv] Augmented Language Models: a Survey

Language Database Augmentation

  • [arXiv] Improving neural language models with a continuous cache

  • [arXiv] Generalization through memorization: Nearest neighbor language models

  • [Nips] Retrieval-augmented generation for knowledge-intensive nlp tasks

  • [arXiv] Few-shot learning with retrieval augmented language models

  • [arXiv] Replug: Retrieval-augmented black-box language models

    • Loss function (retrieves KL divergence between likelihood and language model likelihood) : $$\mathcal{L}=\frac{1}{|\mathcal{B}|} \sum_{x \in \mathcal{B}} K L\left(P_R(d \mid x) | Q_{\mathrm{LM}}(d \mid x, y)\right)$$

Multimodal Database Augmentation

  • [arXiv] Murag: Multimodal retrieval-augmented generator for open question answering over images and text

  • [arXiv] Re-Imagen: Retrieval-Augmented Text-to-Image Generator

  • [openreview] Retrieval-Augmented Multimodal Language Modeling

  • [arXiv] A Survey on Retrieval-Augmented Text Generation

Model Distillation

Noise Optimization

  • [IJCV] Knowledge distillation: A survey

  • [arXiv] Data-Free Knowledge Transfer: A Survey

Generative Reconstruction

  • [arXiv] Data-free knowledge distillation for deep neural networks

  • [CVPR] Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

  • [arXiv] Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!

  • [arXiv] The Life Cycle of Knowledge in Big Language Models: A Survey

  • [arXiv] Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

  • [arXiv] Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

Adversarial Exploration

  • [CVPR] Generic-to-Specific Distillation of Masked Autoencoders

  • [arXiv] Deep Classifier Mimicry without Data Access

Model Reuse

Model Ensemble

  • [Multiple Classifier Systems] Ensemble methods in machine learning

  • ⭐[arXiv] BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

  • ⭐[arXiv] Tangent Model Composition for Ensembling and Continual Fine-tuning

Model Fusion

Weight Interpolation

  • ⭐ [ICML] Linear Mode Connectivity and the Lottery Ticket Hypothesis

  • [openreview] convexity and linear mode connectivity in neural networks

  • ⭐ [ICML] Model soups: averaging weights of multiple finetuned models improves accuracy without increasing inference time

  • [arXiv] Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging

  • [arXiv] Understanding the Effectiveness of Early Weight Averaging for Training Large Language Models

  • [ICLR] Editing models with task arithmetic

Mode Connectivity Based Method

  • ⭐ [ICML] Linear Mode Connectivity and the Lottery Ticket Hypothesis

  • [openreview] On convexity and linear mode connectivity in neural networks

  • ⭐ [arXiv] Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

  • [arXiv] Git Re-Basin: Merging Models modulo Permutation Symmetries

  • [arXiv] George Stoica et al. ZipIt! Merging Models from Different Tasks without Training

Straightforward Optimization

  • [arXiv] Dataless Knowledge Fusion by Merging Weights of Language Models

  • [Nips] Merging Models with Fisher-Weighted Averaging

  • [arXiv]AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

Meta Learning

White-box Data-free Meta-learning

  • [TCSVT] Progressive meta-learning with curriculum

  • [CVPR] Metafscil: A meta-learning approach for few-shot class incremental learning.

  • [ICML] Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

Black-box Data-free Meta-learning

  • [UAI] Meta-learning without data via Wasserstein distributionally-robust model fusion

  • [CVPR] Meta-Learning for Multi-Label Few-Shot Classification

  • [ECCV] Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions

  • [CVPR] Learning To Learn and Remember Super Long Multi-Domain Task Sequence

  • [ICSE] Cross-domain deep code search with meta learning

  • [arXiv] Learning to Learn from APIs: Black-Box Data-Free Meta-Learning

  • [CVPR] Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning

  • [arXiv] FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

  • [AAAI] Training Meta-Surrogate Model for Transferable Adversarial Attack

  • [SP] D-DAE: Defense-Penetrating Model Extraction Attacks

  • [Neurocomputing] MGML: Momentum group meta-learning for few-shot image classification

  • [ICRA] Meta-Learning-Based Optimal Control for Soft Robotic Manipulators to Interact with Unknown Environments

  • [arXiv] Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator

  • [Neuromorphic Computing and Engineering] Meta-learning spiking neural networks with surrogate gradient descent

  • [PMLR] The Role of Deconfounding in Meta-learning

  • [ITSP] Distributed Reptile Algorithm for Meta-Learning Over Multi-Agent Systems

  • [Nips] Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations

  • [Computers & Graphics] An overview on Meta-learning approaches for Few-shot Weakly-supervised Segmentation

Model Editing

Memory Based Model Editing

  • [ICRA] Editing Large Language Models: Problems, Methods, and Opportunities
  • [EMNLP] Memory-assisted prompt editing to improve GPT-3 after deployment

Parameter Based Model Editing

Constrained Tuning

  • [arXiv] Transformer-Patcher: One Mistake worth One Neuron

  • [arXiv] Calibrating Factual Knowledge in Pretrained Language Models

  • [arXiv] Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge

  • [arXiv] Fixing Model Bugs with Natural Language Patches

Locate And Edit

  • [arXiv] Modifying Memories in Transformer Models

  • [arXiv] Mass-Editing Memory in a Transformer

  • [Nips] Locating and Editing Factual Associations in GPT

Meta Learning

  • [arXiv] Rank-One Editing of Encoder-Decoder Models

  • [arXiv] Prompt-Based Editing for Text Style Transfer

  • [CVPR]Conditional Text Image Generation With Diffusion Models

  • [arXiv] Crawling the Internal Knowledge-Base of Language Models

  • [arXiv] The Life Cycle of Knowledge in Big Language Models: A Survey

Citation

If you find this repository useful, please consider citing this paper:

@article{zheng2023learn,
  title={Learn From Model Beyond Fine-Tuning: A Survey},
  author={Zheng, Hongling and Shen, Li and Tang, Anke and Luo, Yong and Hu, Han and Du, Bo and Tao, Dacheng},
  journal={arXiv preprint arXiv:2310.08184},
  year={2023}
}

About

Awesome Learn From Model Beyond Fine-Tuning: A Survey

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published