Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: enhancements and redesign of Audio #5321

Closed
11 of 14 tasks
hannahblair opened this issue Aug 24, 2023 · 3 comments
Closed
11 of 14 tasks

Proposal: enhancements and redesign of Audio #5321

hannahblair opened this issue Aug 24, 2023 · 3 comments
Labels
🎵 Audio Related to Audio component

Comments

@hannahblair
Copy link
Collaborator

hannahblair commented Aug 24, 2023

  • I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
The limitations of Audio have been a frequent point of discussion, and with the increased usage as well as 4.0 coming up, now would be a good time to address the bugs and feature requests to improve the developer and user experience around it. The key issues with it are around the user experience; the editing functionality is quite ambiguous, the slider is quite finicky to use, the design needs an update, and there are a few trimming/device bugs to tackle, but there is some additional functionality which would be useful to implement.

Describe the solution you'd like
A new Audio component, which has a new UI and additional functionality. This issue serves as a discussion around its updated implementation.

Functionality

  • Trimming/Cropping

    • Selecting the edit/pencil/scissors icon will enable the trimming mode
    • The trimming mode allows you to select a segment of audio
    • If a particular segment of audio is currently selected, the play icon will play the selected segment
    • The cropped selection of audio is sent to the prediction function on submit
  • Input/Output

    • Selecting a device icon should show the available I/O devices to allow users to select devices other than their system defaults.
    • It should be clear if either the microphone or the speaker permissions are not enabled. This would help combat confusion around whether the Gradio component is actually working or not.
  • Enforce limits

    • A new param max_length should be implemented in Audio.py to allow developers to limit the length of Audio that can be processed. This is pretty valuable as developers often want to limit queue load times.
    • We could also have a min_length so that developers can ensure meaningful predictions are made in their apps.
    • This could be one param passed in like [5, 20]. We should allow not setting one or the other like [None, 20]. Question on that - what should that be named? We currently just use minimum and maximum for gr.Slider but it should be clear that this is the min + max of what a user of their app can process.
  • Label regions

    • This is probably a feature for either a future iteration or a custom component. An issue has been raised (Apply color labels to regions of Audio component #2023) around labelling certain sections of idea which is quite an interesting one. The suggestion is that you can pass a list of tuples like [(start_time_seconds, end_time_seconds, region_label)]. It could be like YouTube's region labelling:
    • Screenshot 2023-08-24 at 16 32 05 Screenshot 2023-08-24 at 16 42 15
  • Download

    • @osanseviero mentioned the limitation of Audio spaces (see here) which output waveform videos can only be downloaded as a video, which is frustrating for users that just want the audio file. We currently have to use gr.Video as the output for make_waveform, which means downloading generated audio will give you a video, and not the generated audio. The solution to this (as discussed by Omar, @abidlabs, and @dawoodkhan82) is to allow the Audio component to display the output of make_waveform, which will ensure that the downloaded output will actually be the generated audio. Open to thoughts here!

Design

  • Sound

    • It'd be cool to rethink the design of the seek slider. An interactive waveform format would be nice but not trivial to recreate. There's a cool js library wavesurfers which does this but I've yet to delve into the details of this lib. Screenshot 2023-08-24 at 18 43 29
    • Subpoint of this - if we do customise the track, it'd definitely mean a custom element but we'd ensure it inherits all the behaviour from HTMLAudioElement API
  • Menu

    • We currently have the kebab menu (Screenshot 2023-08-24 at 15 14 20) to access Playback and Download. We could expand this, keep it as is, or remove and implement the next point:
    • In line with the planned design for Image (Image proposal #5055), it would be good for consistency to have a similar shared menu which appears below the component when prompted. This would be where device, playback and trimming functionality would be placed.
  • Trimming/Cropping - The UX of this functionality is a key issue, as right now the slider is tricky to use, a little clunky and could be prettier.

    • Selecting the scissors icon will enable the trimming mode. This could be done via a range slider, or just adjusting the ends of the video (see Kapwing's implementation). I like the latter implementation, and avoids some complicated logic around overlapping range nubs.
    • Change the edit pencil icon to scissors. This seems to be the standard for Audio editors and would be better recognised by users. Screenshot 2023-08-24 at 13 38 08
  • Accessibility

    • Core requirements of an accessible audio player should be implemented: Provide keyboard support, make the keyboard focus indicator visible, provide clear labels, and have sufficient contrast between colours for text, controls, and backgrounds.
    • Captions. Has this been discussed or requested? The ability to pass a file with captions is a key feature of accessible media players. Likely for another iteration.

With the Image component refactor underway (#5055), it would be good to stay fairly aligned on the UI. E.g, the proposed menu at the bottom could be a shared interface, and down the line, the Video component could use this too. That said, a lot of this functionality could actually be shared with the Video component (see #3855). It could be cool to merge the two at one point.

All in all, these changes are fairly straightforward and this isn't a complex redesign (🤞). The key here is addressing the user experience of the component and the addition of core media player functionality (though download and region labelling may require some thinking).

Do let me know if there's anything you'd like to discuss or ask, and anything I've forgotten to address! I'll update this issue with developments as I go.

References

app.mediabits.io

Screenshot 2023-08-24 at 14 33 24

playplay.com

Screenshot 2023-08-24 at 14 37 13

kapwing.com

Screenshot 2023-08-24 at 14 39 26

Audio Issues

Feature Requests

Bugs

Related

@hannahblair hannahblair changed the title Audio Proposal Proposal: enhancements and redesign of Audio Aug 24, 2023
@hannahblair hannahblair changed the title Proposal: enhancements and redesign of Audio Proposal: enhancements and redesign of Audio Aug 24, 2023
@hannahblair
Copy link
Collaborator Author

I'll add designs here shortly

@abidlabs abidlabs added the 🎵 Audio Related to Audio component label Aug 25, 2023
@abidlabs
Copy link
Member

Really comprehensive proposal @hannahblair! I think you've covered everything that's been on my radar.

Captions. Has this been discussed or requested? The ability to pass a file with captions is a key feature of accessible media players. Likely for another iteration.

This hasn't been requested as far as I know, so we could potentially add this after 4.0 using a similar API as subtitles in the gr.Video component (definitely makes sense to add captions to make the component accessible.)

The solution to this (as discussed by Omar, @abidlabs, and @dawoodkhan82) is to allow the Audio component to display the output of make_waveform, which will ensure that the downloaded output will actually be the generated audio. Open to thoughts here!

I suppose a general solution here is to allow the gr.Audio component can play video files if passed in -- but if you use it programmatically (e.g. via the client), you'll only get the audio file.

Label regions: this is probably a feature for either a future iteration or a custom component

Feels like a custom component to me

A new param max_length should be implemented in Audio.py to allow developers to limit the length of Audio that can be processed...

Good point about min_length and max_length. I think its fine to keep them as separate parameters for ease of reference.

@pngwn
Copy link
Member

pngwn commented Aug 25, 2023

On the point about consistency with the Image components, I wouldn't be too concerned with that as they are so different. Obviously, we should be consistent where it makes sense. I think it is more important that the video and audio are consistent as they are quite similar in terms of functionality (at least they are today).

Regarding edit mode, would it be possible to be in edit mode without needing to click anything? You could scrub the audio be selecting / dragging within the waveform and crop by dragging the handles at the edges.

Everything else sounds great to me! Thanks for putting this together @hannahblair!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎵 Audio Related to Audio component
Projects
None yet
Development

No branches or pull requests

8 participants