LLM

Jump to bottom

Yuwei (Evelyn) Zhang edited this page Jun 29, 2024 · 7 revisions

PneumoLLM

Use a pre trained CLIP model for both text and image encoder (CLAP in audio case)
Add adaptor, trainable prompt, classifier, three part trainable
mask attention for prompt, unidirectional attention