Skip to content
Yuwei (Evelyn) Zhang edited this page Jun 29, 2024 · 7 revisions

PneumoLLM

  • Use a pre trained CLIP model for both text and image encoder (CLAP in audio case)
  • Add adaptor, trainable prompt, classifier, three part trainable
  • mask attention for prompt, unidirectional attention

Clone this wiki locally