You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched to see if a similar issue already exists.
Is your feature request related to a problem? Please describe.
The limitations of Audio have been a frequent point of discussion, and with the increased usage as well as 4.0 coming up, now would be a good time to address the bugs and feature requests to improve the developer and user experience around it. The key issues with it are around the user experience; the editing functionality is quite ambiguous, the slider is quite finicky to use, the design needs an update, and there are a few trimming/device bugs to tackle, but there is some additional functionality which would be useful to implement.
Describe the solution you'd like
A new Audio component, which has a new UI and additional functionality. This issue serves as a discussion around its updated implementation.
Functionality
Trimming/Cropping
Selecting the edit/pencil/scissors icon will enable the trimming mode
The trimming mode allows you to select a segment of audio
If a particular segment of audio is currently selected, the play icon will play the selected segment
The cropped selection of audio is sent to the prediction function on submit
Input/Output
Selecting a device icon should show the available I/O devices to allow users to select devices other than their system defaults.
It should be clear if either the microphone or the speaker permissions are not enabled. This would help combat confusion around whether the Gradio component is actually working or not.
Enforce limits
A new param max_length should be implemented in Audio.py to allow developers to limit the length of Audio that can be processed. This is pretty valuable as developers often want to limit queue load times.
We could also have a min_length so that developers can ensure meaningful predictions are made in their apps.
This could be one param passed in like [5, 20]. We should allow not setting one or the other like [None, 20]. Question on that - what should that be named? We currently just use minimum and maximum for gr.Slider but it should be clear that this is the min + max of what a user of their app can process.
Label regions
This is probably a feature for either a future iteration or a custom component. An issue has been raised (Apply color labels to regions of Audio component #2023) around labelling certain sections of idea which is quite an interesting one. The suggestion is that you can pass a list of tuples like [(start_time_seconds, end_time_seconds, region_label)]. It could be like YouTube's region labelling:
Download
@osanseviero mentioned the limitation of Audio spaces (see here) which output waveform videos can only be downloaded as a video, which is frustrating for users that just want the audio file. We currently have to use gr.Video as the output for make_waveform, which means downloading generated audio will give you a video, and not the generated audio. The solution to this (as discussed by Omar, @abidlabs, and @dawoodkhan82) is to allow the Audio component to display the output of make_waveform, which will ensure that the downloaded output will actually be the generated audio. Open to thoughts here!
Design
Sound
It'd be cool to rethink the design of the seek slider. An interactive waveform format would be nice but not trivial to recreate. There's a cool js library wavesurfers which does this but I've yet to delve into the details of this lib.
Subpoint of this - if we do customise the track, it'd definitely mean a custom element but we'd ensure it inherits all the behaviour from HTMLAudioElement API
Menu
We currently have the kebab menu () to access Playback and Download. We could expand this, keep it as is, or remove and implement the next point:
In line with the planned design for Image (Image proposal #5055), it would be good for consistency to have a similar shared menu which appears below the component when prompted. This would be where device, playback and trimming functionality would be placed.
Trimming/Cropping - The UX of this functionality is a key issue, as right now the slider is tricky to use, a little clunky and could be prettier.
Selecting the scissors icon will enable the trimming mode. This could be done via a range slider, or just adjusting the ends of the video (see Kapwing's implementation). I like the latter implementation, and avoids some complicated logic around overlapping range nubs.
Change the edit pencil icon to scissors. This seems to be the standard for Audio editors and would be better recognised by users.
Accessibility
Core requirements of an accessible audio player should be implemented: Provide keyboard support, make the keyboard focus indicator visible, provide clear labels, and have sufficient contrast between colours for text, controls, and backgrounds.
Captions. Has this been discussed or requested? The ability to pass a file with captions is a key feature of accessible media players. Likely for another iteration.
With the Image component refactor underway (#5055), it would be good to stay fairly aligned on the UI. E.g, the proposed menu at the bottom could be a shared interface, and down the line, the Video component could use this too. That said, a lot of this functionality could actually be shared with the Video component (see #3855). It could be cool to merge the two at one point.
All in all, these changes are fairly straightforward and this isn't a complex redesign (🤞). The key here is addressing the user experience of the component and the addition of core media player functionality (though download and region labelling may require some thinking).
Do let me know if there's anything you'd like to discuss or ask, and anything I've forgotten to address! I'll update this issue with developments as I go.
Really comprehensive proposal @hannahblair! I think you've covered everything that's been on my radar.
Captions. Has this been discussed or requested? The ability to pass a file with captions is a key feature of accessible media players. Likely for another iteration.
This hasn't been requested as far as I know, so we could potentially add this after 4.0 using a similar API as subtitles in the gr.Video component (definitely makes sense to add captions to make the component accessible.)
The solution to this (as discussed by Omar, @abidlabs, and @dawoodkhan82) is to allow the Audio component to display the output of make_waveform, which will ensure that the downloaded output will actually be the generated audio. Open to thoughts here!
I suppose a general solution here is to allow the gr.Audio component can play video files if passed in -- but if you use it programmatically (e.g. via the client), you'll only get the audio file.
Label regions: this is probably a feature for either a future iteration or a custom component
Feels like a custom component to me
A new param max_length should be implemented in Audio.py to allow developers to limit the length of Audio that can be processed...
Good point about min_length and max_length. I think its fine to keep them as separate parameters for ease of reference.
On the point about consistency with the Image components, I wouldn't be too concerned with that as they are so different. Obviously, we should be consistent where it makes sense. I think it is more important that the video and audio are consistent as they are quite similar in terms of functionality (at least they are today).
Regarding edit mode, would it be possible to be in edit mode without needing to click anything? You could scrub the audio be selecting / dragging within the waveform and crop by dragging the handles at the edges.
Everything else sounds great to me! Thanks for putting this together @hannahblair!
Is your feature request related to a problem? Please describe.
The limitations of Audio have been a frequent point of discussion, and with the increased usage as well as 4.0 coming up, now would be a good time to address the bugs and feature requests to improve the developer and user experience around it. The key issues with it are around the user experience; the editing functionality is quite ambiguous, the slider is quite finicky to use, the design needs an update, and there are a few trimming/device bugs to tackle, but there is some additional functionality which would be useful to implement.
Describe the solution you'd like
A new Audio component, which has a new UI and additional functionality. This issue serves as a discussion around its updated implementation.
Functionality
Trimming/Cropping
Input/Output
Enforce limits
max_length
should be implemented inAudio.py
to allow developers to limit the length of Audio that can be processed. This is pretty valuable as developers often want to limit queue load times.min_length
so that developers can ensure meaningful predictions are made in their apps.[5, 20]
. We should allow not setting one or the other like[None, 20]
. Question on that - what should that be named? We currently just useminimum
andmaximum
forgr.Slider
but it should be clear that this is the min + max of what a user of their app can process.Label regions
[(start_time_seconds, end_time_seconds, region_label)]
. It could be like YouTube's region labelling:Download
gr.Video
as the output formake_waveform
, which means downloading generated audio will give you a video, and not the generated audio. The solution to this (as discussed by Omar, @abidlabs, and @dawoodkhan82) is to allow the Audio component to display the output ofmake_waveform
, which will ensure that the downloaded output will actually be the generated audio. Open to thoughts here!Design
Sound
HTMLAudioElement
APIMenu
Image
proposal #5055), it would be good for consistency to have a similar shared menu which appears below the component when prompted. This would be where device, playback and trimming functionality would be placed.Trimming/Cropping - The UX of this functionality is a key issue, as right now the slider is tricky to use, a little clunky and could be prettier.
Accessibility
With the Image component refactor underway (#5055), it would be good to stay fairly aligned on the UI. E.g, the proposed menu at the bottom could be a shared interface, and down the line, the Video component could use this too. That said, a lot of this functionality could actually be shared with the Video component (see #3855). It could be cool to merge the two at one point.
All in all, these changes are fairly straightforward and this isn't a complex redesign (🤞). The key here is addressing the user experience of the component and the addition of core media player functionality (though download and region labelling may require some thinking).
Do let me know if there's anything you'd like to discuss or ask, and anything I've forgotten to address! I'll update this issue with developments as I go.
References
app.mediabits.io
playplay.com
kapwing.com
Audio Issues
Feature Requests
Bugs
gr.Audio
causes error if the value was set from the beginning #5299Related
The text was updated successfully, but these errors were encountered: