Skip to content
Yuwei (Evelyn) Zhang edited this page Jul 21, 2024 · 7 revisions

Health-related

PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis

  • Input: Chest Radiography
  • Method: Adaptor tuning (+ prompt tuning?)
  • Use a pre trained CLIP model for both text and image encoder (CLAP in audio case)
  • Add adaptor, trainable prompt, classifier, three part trainable
  • mask attention for prompt, unidirectional attention

A foundation model for generalizable disease detection from retinal images nature 2023

  • Input: retinal images
  • Method: fine-tuning entire
  • curate open datasets to train two modalities 2 models
  • 8 datasets, three groups of evaluation tasks, fine-tuning, including external evaluation (fine tune on A test on B)
  • MAE and 4 contrastive methods
  • interpretability through a tool using salient map
  • some analysis on distribution shift

Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

  • Data: handcrafted features from ECG and PPG
  • Way: prompt designing & finetuning
  • context-enhanced prompts, domain knowledge + user information
  • compared 10 LLMs, compared size of training data needed

PEFT method

Prefix tuning

Prompt tuning

Adaptor

LoRA

(IA)^3

AutoLoRa: An Automated Robust Fine-Tuning Framework ICLR 2024 | code

  • challenge: divergence between adversarial and natural objectives
  • introduce a low-rank (LoRa) branch that disentangles RFT into two distinct components: optimizing natural objectives via the LoRa branch and adversarial objectives via the feature extractor

modality fusion method

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

  • Patch Reprogram: first reduce text corpus (32000) to prototypes (2000) through linear layer, then multi-head attention between TS patch embeddings and text prototype embeddings
  • concat the prompt with the reprogrammed TS patch embeddings into frozen LLM, add linear layer for forecasting

SpeechVerse: A Large-scale Generalizable Audio Language Model | Amazon AWS AI Labs

  • use a CNN to downsample the audio embedding (of all frames)
  • use LoRA to tune the LLM
  • use curriculum learning to deal with gradient explosion problem observed when training both modules, so train the CNN first and ...

Q-Former BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

  • cross-modal alignment to leverage unimodal foundation models, bridge the modality gap with a lightweight Query Transformer (e.g. BERT 188M)
  • Q-Former: set of learnable query vectors to extract visual features from frozen image encoder, act as bottleneck
  • 2 stages to pre-train the Q-Former:
    1. vision-language representation learning, learn visual representations more relevant to the text
    2. vision-to-language generative learning, connect the output to a frozen LLM and train
  • further: PEFT the full model or PEFT the Q-former or both

Clone this wiki locally