LLM

Jump to bottom

Yuwei (Evelyn) Zhang edited this page Jul 21, 2024 · 7 revisions

Health-related

PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis

Input: Chest Radiography
Method: Adaptor tuning (+ prompt tuning?)
Use a pre trained CLIP model for both text and image encoder (CLAP in audio case)
Add adaptor, trainable prompt, classifier, three part trainable
mask attention for prompt, unidirectional attention

A foundation model for generalizable disease detection from retinal images nature 2023

Input: retinal images
Method: fine-tuning entire
curate open datasets to train two modalities 2 models
8 datasets, three groups of evaluation tasks, fine-tuning, including external evaluation (fine tune on A test on B)
MAE and 4 contrastive methods
interpretability through a tool using salient map
some analysis on distribution shift

Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

Data: handcrafted features from ECG and PPG
Way: prompt designing & finetuning
context-enhanced prompts, domain knowledge + user information
compared 10 LLMs, compared size of training data needed

PEFT method

Prefix tuning

Prompt tuning

Adaptor

LoRA

(IA)^3

AutoLoRa: An Automated Robust Fine-Tuning Framework ICLR 2024 | code

challenge: divergence between adversarial and natural objectives
introduce a low-rank (LoRa) branch that disentangles RFT into two distinct components: optimizing natural objectives via the LoRa branch and adversarial objectives via the feature extractor

modality fusion method

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Patch Reprogram: first reduce text corpus (32000) to prototypes (2000) through linear layer, then multi-head attention between TS patch embeddings and text prototype embeddings
concat the prompt with the reprogrammed TS patch embeddings into frozen LLM, add linear layer for forecasting

SpeechVerse: A Large-scale Generalizable Audio Language Model | Amazon AWS AI Labs

use a CNN to downsample the audio embedding (of all frames)
use LoRA to tune the LLM
use curriculum learning to deal with gradient explosion problem observed when training both modules, so train the CNN first and ...

Q-Former BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

cross-modal alignment to leverage unimodal foundation models, bridge the modality gap with a lightweight Query Transformer (e.g. BERT 188M)
Q-Former: set of learnable query vectors to extract visual features from frozen image encoder, act as bottleneck
2 stages to pre-train the Q-Former:
1. vision-language representation learning, learn visual representations more relevant to the text
2. vision-to-language generative learning, connect the output to a frozen LLM and train
further: PEFT the full model or PEFT the Q-former or both

adaptor

(linear) projection

mapping network (Pengi)

Flamingo (Gated X-attn layers)