Audio Features Meta-Issue

Audio models are newly supported in SwarmUI, so this meta-issue is a list of key things to work on. PRs are welcome for individual pieces.
- [ ] The batch view mishandles audio pretty badly.
- [ ] The center output doesn't do a great job with audio either
- [ ] Need a proper custom audio control, probably piggyback the existing custom video control
- [x] Need "Init Audio" as a concept and parameters for it. Probably inside the Text2Audio group? A sub-group? Or separate? Visibility toggle? Does LTX2 reuse the same params?
- [x] Need to rewrite workflow generator code to use the new `NodeOutData` more often, and have different stages play nice and adapt according to whether they're operating on audio/video/images
- [ ] Need an "Edit Audio" UI, that at the very least allows you to mask sections of an init audio coherently
- [ ] Need a "SwarmSaveAudioWS" akin to the image/animation ws save options
- [ ] Need user settings to control audio format (wav/mp3/flac/...?) and Swarm code to appropriate convert between and save them
- [ ] Need to handle metadata saving in audio files

Things that need more design thought (poke with ideas on discord in the development channel)
- [ ] How to do piping between different stages well? Say for example, Ace-Step generates a song and LTX-2 uses that song as its audio layer.
- [ ] How should multi-section prompts work? (lyrics vs style vs etc). I've initially done ace-step as Prompt=Lyrics and Style is a separate box, but that feels weird. Maybe dynamically spawn a new central box? Maybe prompt regions with `<tags>`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audio Features Meta-Issue #1272

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Audio Features Meta-Issue #1272

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions