Skip to content

Audio Features Meta-Issue #1272

@mcmonkey4eva

Description

@mcmonkey4eva

Audio models are newly supported in SwarmUI, so this meta-issue is a list of key things to work on. PRs are welcome for individual pieces.

  • The batch view mishandles audio pretty badly.
  • The center output doesn't do a great job with audio either
  • Need a proper custom audio control, probably piggyback the existing custom video control
  • Need "Init Audio" as a concept and parameters for it. Probably inside the Text2Audio group? A sub-group? Or separate? Visibility toggle? Does LTX2 reuse the same params?
  • Need to rewrite workflow generator code to use the new NodeOutData more often, and have different stages play nice and adapt according to whether they're operating on audio/video/images
  • Need an "Edit Audio" UI, that at the very least allows you to mask sections of an init audio coherently
  • Need a "SwarmSaveAudioWS" akin to the image/animation ws save options
  • Need user settings to control audio format (wav/mp3/flac/...?) and Swarm code to appropriate convert between and save them
  • Need to handle metadata saving in audio files

Things that need more design thought (poke with ideas on discord in the development channel)

  • How to do piping between different stages well? Say for example, Ace-Step generates a song and LTX-2 uses that song as its audio layer.
  • How should multi-section prompts work? (lyrics vs style vs etc). I've initially done ace-step as Prompt=Lyrics and Style is a separate box, but that feels weird. Maybe dynamically spawn a new central box? Maybe prompt regions with <tags>?

Metadata

Metadata

Assignees

No one assigned

    Labels

    C#This pertains to the C# engineEasy PRWant to contribute? Here's a good thing to tryFeatureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions