This repository contains the official implementation of "Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities".
We introduce APT, an audio adapter that extend LLMs/VLMs to the audio domain by soft prompt tuning. APT-enhanced LLMs (namely APT-LLMs) demonstrate a strong audio understanding capacity in several audio downstream tasks, such as audio captioning, few-shot audio classification, and natural language audio reasoning.
- APT model and checkpoint release.
- APT inference code release.
- Natural language audio reasoning database release.