Proposal: enhancements and redesign of `Audio` #5321

hannahblair · 2023-08-24T11:23:35Z

I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
The limitations of Audio have been a frequent point of discussion, and with the increased usage as well as 4.0 coming up, now would be a good time to address the bugs and feature requests to improve the developer and user experience around it. The key issues with it are around the user experience; the editing functionality is quite ambiguous, the slider is quite finicky to use, the design needs an update, and there are a few trimming/device bugs to tackle, but there is some additional functionality which would be useful to implement.

Describe the solution you'd like
A new Audio component, which has a new UI and additional functionality. This issue serves as a discussion around its updated implementation.

Functionality

Trimming/Cropping
- Selecting the edit/pencil/scissors icon will enable the trimming mode
- The trimming mode allows you to select a segment of audio
- If a particular segment of audio is currently selected, the play icon will play the selected segment
- The cropped selection of audio is sent to the prediction function on submit
Input/Output
- Selecting a device icon should show the available I/O devices to allow users to select devices other than their system defaults.
- It should be clear if either the microphone or the speaker permissions are not enabled. This would help combat confusion around whether the Gradio component is actually working or not.
Enforce limits
- A new param max_length should be implemented in Audio.py to allow developers to limit the length of Audio that can be processed. This is pretty valuable as developers often want to limit queue load times.
- We could also have a min_length so that developers can ensure meaningful predictions are made in their apps.
- This could be one param passed in like [5, 20]. We should allow not setting one or the other like [None, 20]. Question on that - what should that be named? We currently just use minimum and maximum for gr.Slider but it should be clear that this is the min + max of what a user of their app can process.
Label regions
- This is probably a feature for either a future iteration or a custom component. An issue has been raised (Apply color labels to regions of Audio component #2023) around labelling certain sections of idea which is quite an interesting one. The suggestion is that you can pass a list of tuples like [(start_time_seconds, end_time_seconds, region_label)]. It could be like YouTube's region labelling:
Download
- @osanseviero mentioned the limitation of Audio spaces (see here) which output waveform videos can only be downloaded as a video, which is frustrating for users that just want the audio file. We currently have to use gr.Video as the output for make_waveform, which means downloading generated audio will give you a video, and not the generated audio. The solution to this (as discussed by Omar, @abidlabs, and @dawoodkhan82) is to allow the Audio component to display the output of make_waveform, which will ensure that the downloaded output will actually be the generated audio. Open to thoughts here!

Design

Sound
- It'd be cool to rethink the design of the seek slider. An interactive waveform format would be nice but not trivial to recreate. There's a cool js library wavesurfers which does this but I've yet to delve into the details of this lib.
- Subpoint of this - if we do customise the track, it'd definitely mean a custom element but we'd ensure it inherits all the behaviour from HTMLAudioElement API
Menu
- We currently have the kebab menu () to access Playback and Download. We could expand this, keep it as is, or remove and implement the next point:
- In line with the planned design for Image (Image proposal #5055), it would be good for consistency to have a similar shared menu which appears below the component when prompted. This would be where device, playback and trimming functionality would be placed.
Trimming/Cropping - The UX of this functionality is a key issue, as right now the slider is tricky to use, a little clunky and could be prettier.
- Selecting the scissors icon will enable the trimming mode. This could be done via a range slider, or just adjusting the ends of the video (see Kapwing's implementation). I like the latter implementation, and avoids some complicated logic around overlapping range nubs.
- Change the edit pencil icon to scissors. This seems to be the standard for Audio editors and would be better recognised by users.
Accessibility
- Core requirements of an accessible audio player should be implemented: Provide keyboard support, make the keyboard focus indicator visible, provide clear labels, and have sufficient contrast between colours for text, controls, and backgrounds.
- Captions. Has this been discussed or requested? The ability to pass a file with captions is a key feature of accessible media players. Likely for another iteration.

With the Image component refactor underway (#5055), it would be good to stay fairly aligned on the UI. E.g, the proposed menu at the bottom could be a shared interface, and down the line, the Video component could use this too. That said, a lot of this functionality could actually be shared with the Video component (see #3855). It could be cool to merge the two at one point.

All in all, these changes are fairly straightforward and this isn't a complex redesign (🤞). The key here is addressing the user experience of the component and the addition of core media player functionality (though download and region labelling may require some thinking).

Do let me know if there's anything you'd like to discuss or ask, and anything I've forgotten to address! I'll update this issue with developments as I go.

References

app.mediabits.io

playplay.com

kapwing.com

Audio Issues

Feature Requests

Bugs

Related

Merge Video and Audio to the new component #3855

The text was updated successfully, but these errors were encountered:

hannahblair · 2023-08-24T14:50:02Z

I'll add designs here shortly

abidlabs · 2023-08-25T16:16:09Z

Really comprehensive proposal @hannahblair! I think you've covered everything that's been on my radar.

Captions. Has this been discussed or requested? The ability to pass a file with captions is a key feature of accessible media players. Likely for another iteration.

This hasn't been requested as far as I know, so we could potentially add this after 4.0 using a similar API as subtitles in the gr.Video component (definitely makes sense to add captions to make the component accessible.)

The solution to this (as discussed by Omar, @abidlabs, and @dawoodkhan82) is to allow the Audio component to display the output of make_waveform, which will ensure that the downloaded output will actually be the generated audio. Open to thoughts here!

I suppose a general solution here is to allow the gr.Audio component can play video files if passed in -- but if you use it programmatically (e.g. via the client), you'll only get the audio file.

Label regions: this is probably a feature for either a future iteration or a custom component

Feels like a custom component to me

A new param max_length should be implemented in Audio.py to allow developers to limit the length of Audio that can be processed...

Good point about min_length and max_length. I think its fine to keep them as separate parameters for ease of reference.

pngwn · 2023-08-25T19:37:16Z

On the point about consistency with the Image components, I wouldn't be too concerned with that as they are so different. Obviously, we should be consistent where it makes sense. I think it is more important that the video and audio are consistent as they are quite similar in terms of functionality (at least they are today).

Regarding edit mode, would it be possible to be in edit mode without needing to click anything? You could scrub the audio be selecting / dragging within the waveform and crop by dragging the handles at the edges.

Everything else sounds great to me! Thanks for putting this together @hannahblair!

hannahblair changed the title ~~Audio Proposal~~ Proposal: enhancements and redesign of Audio Aug 24, 2023

hannahblair changed the title ~~Proposal: enhancements and redesign of Audio~~ Proposal: enhancements and redesign of Audio Aug 24, 2023

hannahblair assigned abidlabs, aliabid94, aliabd, dawoodkhan82, pngwn, hysts and freddyaboulton Aug 24, 2023

abidlabs added the 🎵 Audio Related to Audio component label Aug 25, 2023

abidlabs mentioned this issue Sep 4, 2023

Scale parameter not working on upload audio widget #5406

Closed

1 task

abidlabs mentioned this issue Oct 23, 2023

Improve Audio Component #5966

Merged

33 tasks

hannahblair closed this as completed Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: enhancements and redesign of `Audio` #5321

Proposal: enhancements and redesign of `Audio` #5321

hannahblair commented Aug 24, 2023 •

edited by abidlabs

hannahblair commented Aug 24, 2023

abidlabs commented Aug 25, 2023

pngwn commented Aug 25, 2023

Proposal: enhancements and redesign of Audio #5321

Proposal: enhancements and redesign of Audio #5321

Comments

hannahblair commented Aug 24, 2023 • edited by abidlabs

hannahblair commented Aug 24, 2023

abidlabs commented Aug 25, 2023

pngwn commented Aug 25, 2023

Proposal: enhancements and redesign of `Audio` #5321

Proposal: enhancements and redesign of `Audio` #5321

hannahblair commented Aug 24, 2023 •

edited by abidlabs