-
Notifications
You must be signed in to change notification settings - Fork 0
LLM
Yuwei (Evelyn) Zhang edited this page Jul 21, 2024
·
7 revisions
PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis
- Input: Chest Radiography
- Method: Adaptor tuning (+ prompt tuning?)
- Use a pre trained CLIP model for both text and image encoder (CLAP in audio case)
- Add adaptor, trainable prompt, classifier, three part trainable
- mask attention for prompt, unidirectional attention
A foundation model for generalizable disease detection from retinal images nature 2023
- Input: retinal images
- Method: fine-tuning entire
- curate open datasets to train two modalities 2 models
- 8 datasets, three groups of evaluation tasks, fine-tuning, including external evaluation (fine tune on A test on B)
- MAE and 4 contrastive methods
- interpretability through a tool using salient map
- some analysis on distribution shift
Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals
- Data: handcrafted features from ECG and PPG
- Way: prompt designing & finetuning
- context-enhanced prompts, domain knowledge + user information
- compared 10 LLMs, compared size of training data needed
Prefix tuning
Prompt tuning
Adaptor
LoRA
(IA)^3
AutoLoRa: An Automated Robust Fine-Tuning Framework ICLR 2024 | code
- challenge: divergence between adversarial and natural objectives
- introduce a low-rank (LoRa) branch that disentangles RFT into two distinct components: optimizing natural objectives via the LoRa branch and adversarial objectives via the feature extractor
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
- Patch Reprogram: first reduce text corpus (32000) to prototypes (2000) through linear layer, then multi-head attention between TS patch embeddings and text prototype embeddings
- concat the prompt with the reprogrammed TS patch embeddings into frozen LLM, add linear layer for forecasting
SpeechVerse: A Large-scale Generalizable Audio Language Model | Amazon AWS AI Labs
- use a CNN to downsample the audio embedding (of all frames)
- use LoRA to tune the LLM
- use curriculum learning to deal with gradient explosion problem observed when training both modules, so train the CNN first and ...
- cross-modal alignment to leverage unimodal foundation models, bridge the modality gap with a lightweight Query Transformer (e.g. BERT 188M)
- Q-Former: set of learnable query vectors to extract visual features from frozen image encoder, act as bottleneck
- 2 stages to pre-train the Q-Former:
- vision-language representation learning, learn visual representations more relevant to the text
- vision-to-language generative learning, connect the output to a frozen LLM and train
- further: PEFT the full model or PEFT the Q-former or both
adaptor
(linear) projection
mapping network (Pengi)
Flamingo (Gated X-attn layers)