proposal: x/mobile audio #13432

rakyll · 2015-11-30T21:55:51Z

In the scope of the Go mobile project, an audio package that supports decoding and playback is a top priority. The current status of audio support under x/mobile is limited to OpenAL bindings and an experimental high-level audio player that is backed by OpenAL.

The current experimental audio package fails to

provide high level abstractions to represents audio and audio processors,
implement a memory-efficient playback model,
implement decoders (e.g. an mp3 decoder),
support live streaming or other networking audio sources.

In order to address these concerns, I am proposing core abstractions and a minimal set of features based on the proposed abstractions to provide decoding and playback support.

See the proposal document for further details.

gopherbot · 2015-11-30T22:03:13Z

CL https://golang.org/cl/17262 mentions this issue.

mattetti · 2015-12-01T04:54:05Z

It looks great!

How about adding AudioFormat string to the Clip interface? For instance a wav audio file can be encoded in a variety of coding formats. It might not belong to the interface since it's a detail that maybe only the decoder cares but I can think of a few examples where it would be useful.

WAV and AIFF both support float PCM values even though the use of float values is unpopular. Should we consider supporting float values?

I think we should, any kind of audio processing would more than likely done in float values, using float right away seems to be a better move IMHO (not a deal breaker tho).

Decoding on desktop. The package will use the system codec libraries provided by Android and iOS on mobile devices. It is not possible to provide feature parity for desktop envs in the scope of decoding.

We can implement the decoders in pure Go code for when codecs are not available. I have partially implemented such decoders and while they might not be as fast as dedicated/optimized codecs, I think they should work well enough.

mattetti · 2015-12-01T05:09:57Z

L117 small typo, Clips represents PCM audio should read Clips represent PCM audio

mattetti · 2015-12-01T06:46:35Z

Question, why would we have a generic DecodeWAVBytes decoder function? It seems very specific to wav decoders only, doesn't it? DecodeBytes might be a better choice (?)

mattetti · 2015-12-01T08:31:28Z

I created a quick and dirty but conform aiff decoder to get a feel for the API: https://github.com/mattetti/exp/blob/master/audio/aiff/decoder.go

I think it might be missing an API to get frames out of a clip. The current proposal goes all the way to get the clip and frame info but we can't get the actual PCM data (which would probably some sort of [][int]float format). Depending on the use case, we might not want to load the entire audio file in memory, so the API might want to take that in consideration.

What do you think?

rakyll · 2015-12-01T20:09:21Z

Thanks for the feedback!

How about adding AudioFormat string to the Clip interface? For instance a wav audio file can be encoded in a variety of coding formats. It might not belong to the interface since it's a detail that maybe only the decoder cares but I can think of a few examples where it would be useful.

The AudioFormat would represent the coding format of the original source, not the Clip's. The decoders won't work with Clips but arbitrary encoded streams of bytes and convert them into a Clip. The decoder can return the audio format of the original source. Consider the following decoder function:

func Decode(src io.ReadSeeker) (clip Clip, format string, err error)

I think we should, any kind of audio processing would more than likely done in float values, using float right away seems to be a better move IMHO (not a deal breaker tho).

This is unfortunately coming with a performance cost. We need to find a way to support float values but also allow integer-only values for faster processing for those who doesn't care about float-level precision.

We can implement the decoders in pure Go code for when codecs are not available.

We won't have enough human resources probably to reimplement decoders in Go, therefore using the available codecs is going to be our initial step. The proposal is not against of a decoder being implemented in vanilla Go though.

Question, why would we have a generic DecodeWAVBytes decoder function? It seems very specific to wav decoders only, doesn't it? DecodeBytes might be a better choice (?)

It was a sample for decoder implementations, I was not proposing it to be in the audio package. DecodeBytes is a better name in any case, I will update the proposal.

I think it might be missing an API to get frames out of a clip.

We had an earlier debate about providing APIs that work with frames rather than byte slices. The previous API was in the lines of what's below.

type Frame []int64

type Clip interface {
     Read(f []Frame, offset int64) (n int, err error)
}

There are two arguments against the frame-based APIs:

It is easy to calculate how long a frame is, therefore working with byte slices doesn't require much additional work.
Read(f []Frame, offset int64) (n int, err error) almost reads like io.ReadSeeker. Not introducing a new interface type allows us to reuse utility functions from the stdlib on Clip instances (such as io.Copy).

mattetti · 2015-12-01T20:46:10Z

My only concern is that a frame from a wav file can contain data using different encoding and non PCM formats will definitely not have PCM frames. For instance, if one would be to write a mp3 decoder, would the decoder have to provide a clip with an io.ReadSeeker that would return PCM data only?
If that's the case, you might want to make that a little bit clearer.

rakyll · 2015-12-02T18:45:38Z

For instance, if one would be to write a mp3 decoder, would the decoder have to provide a clip with an io.ReadSeeker that would return PCM data only?

Correct. You can read only and only PCM formatted data from clips. A decoder is responsible to decode any input down to a PCM formatted frames. I am going to update the document to make it clearer.

Updates golang/go#13432. Change-Id: I718006e8f039c476d456c1276c54132bd66d9410 Reviewed-on: https://go-review.googlesource.com/17262 Reviewed-by: Burcu Dogan <jbd@google.com>

mewmew · 2015-12-09T13:36:39Z

provide high level abstractions to represents audio and audio processors,

Some work has already been done in this domain, by the audio package of Azul3d. It is intended to define interfaces analogous to the Image interface of the standard library for audio.

Perhaps some ideas could be stolen from its design. @slimsag is the original developer, and he may have some insight into problems encountered when designing the interface, and how they were solved by Azul3d.

implement decoders (e.g. an mp3 decoder),

There exist at least two FLAC decoders in pure Go.

The second FLAC package is conceptually a back-end for decoding FLAC audio files, for which one or more front-end may be implemented. Currently a front-end has been implemented for the Azul3d audio abstraction, and it should be trivial to implement another front-end for the Go mobile audio abstraction. Analogously to the image package of the standard library, to decode a FLAC audio stream, it suffices to have

import _ "azul3d.org/audio/flac.v0"

in a program's main package.

Below follows a short example program for decoding FLAC audio files with Azul3d:

package main

import (
    "flag"
    "fmt"
    "log"
    "os"

    "azul3d.org/audio.v1"
    _ "azul3d.org/audio/flac.v0" // Add FLAC decoding support.
)

func main() {
    flag.Parse()
    for _, path := range flag.Args() {
        err := parseFLAC(path)
        if err != nil {
            log.Fatal(err)
        }
    }
}

func parseFLAC(path string) error {
    // Open file.
    f, err := os.Open(path)
    if err != nil {
        return err
    }
    defer f.Close()

    // Create decoder.
    dec, _, err := audio.NewDecoder(f)
    if err != nil {
        return err
    }

    // Decode audio stream.
    for {
        samples := make(audio.PCM32Samples, 1024)
        n, err := dec.Read(samples)
        if err != nil {
            if err == audio.EOS {
                return nil
            }
            return err
        }
        fmt.Println(samples[:n])
    }
}

I hope this may give some ideas, and hope improve the API and design of the Go mobile audio package.

Cheers /u

rakyll · 2016-02-16T19:55:28Z

Thanks for the input. The proposal is currently on hold and require more work on a variety of topics:

The interfaces need to consider VBR compression.
Seeking is often expensive, we may adopt an io.ReadSeeker-like interface to make Seek a standalone operation to put some more emphasis on its cost.
It is not clear how big a buffer is required for a frame for VBR encoded media. A Frame type may be to introduced.
We need working prototype decoders for AIFF, FLAC, MP3 and OGG before finalizing the proposal.

rakyll · 2016-07-15T00:59:34Z

@mattetti and I have been working on audio during GopherCon and will update the proposal soon with our most recent ideas.

Seeking within the audio source is not always a trivial problem, e.g. the cases where the underlying source is encoded with a VBR compression algorithm, optimistic seeking and slurping the entire data source may be required. Separation of the Seek method allows us to explain the requirements from the implementors more easily. Since different encoders will have different stragies to Seek, it is also easy to document the cost and the underlying algorithm to the users. Also, the seperation helps us document the requirements of implementing an efficient Clip.Frames. We are explecting decoders to have a cursor on the data source and move forward as new Frames calls arrive. We only expect them to modify the cursor if seeking is required. Without a separate Seek method, implementors need to check if the offset is matching with their current internal cursor and seek if it is not. Separation saves them figuring out the conditional case for a requirement of seeking. Change-Id: I0b15d56df9457a462953aaaf915445e268462f97 Reviewed-on: https://go-review.googlesource.com/24702 Reviewed-by: David Crawshaw <crawshaw@golang.org>

mewmew · 2016-08-11T01:02:07Z

@mattetti and I have been working on audio during GopherCon and will update the proposal soon with our most recent ideas.

@rakyll Lovely to hear. Any updates on this? Would be interested to see what you have been playing with.

mattetti · 2016-08-11T01:11:39Z

@mewmew we decided to reduce the scope of the proposal to an interface allowing reading and writing of PCM data. We are currently working on various test implementations covering multiple use cases (audio analysis, processing and playback) with a few different formats and platforms.

The big challenge is defining an interface that is multi-purpose yet flexible enough to support various kind of codecs. As we started implemented our ideas, we started finding issues/challenges and went back to the drawing board. As soon as we have decent implementations validating our design, we will submit an updated proposal.

mewmew · 2016-08-11T15:19:16Z

@mewmew we decided to reduce the scope of the proposal to an interface allowing reading and writing of PCM data. We are currently working on various test implementations covering multiple use cases (audio analysis, processing and playback) with a few different formats and platforms.

The big challenge is defining an interface that is multi-purpose yet flexible enough to support various kind of codecs. As we started implemented our ideas, we started finding issues/challenges and went back to the drawing board. As soon as we have decent implementations validating our design, we will submit an updated proposal.

Sounds great! Glad to hear that you are fleshing out these details. I know quite a few Gophers in the community that would be happy to stress test the API of these interfaces. Are the preliminary interfaces available in a public repository by any chance?

Edit: To provide some additional background. The audio project of the Azul3D engine would be happy to see where and how these interfaces fit into the "generic" audio package, which is analogous to the image package of the standard Go library. My brother and I would also be interested in looking into how difficult it would be to add support for FLAC decoding, targetting these interfaces. Then there are quite a few music players, audio visualizers and other such software in the community which would like to use the audio API. So you have a few stress testers lined up :)

rsc · 2016-12-19T21:07:28Z

On hold for #17244.

hraban · 2016-12-29T00:13:15Z

Hi, bit late to the party but I have a small extra data point: the Opus API. (I maintain a wrapper around xiph.org's standard implementations, libopus and libopusfile; https://github.com/hraban/opus. The API mirrors the C API, in Go flavor.)

One note I didn't see explicitly covered (sorry if I missed it) is the difference between a wrapper format like Ogg, and an actual encoding like Opus or Vorbis. If I get a .opus file, that's not raw opus data, that's an ogg stream with opus data inside it. However, if I pass a bunch of PCM frames to an Opus encoder, I don't get an opus stream: I get opus bytes. This is particularly relevant if you don't control the source; if something is feeding you raw (unwrapped) bytes of an encoded audio stream you can't pass it to a decoder expecting it to be wrapped in Ogg or whatever.

On floats: they're great when you're mixing the audio because they don't clip. You can mix up, down, sideways, and scale back without special checks nor losing data. They have their place, it's nice to have an extra FramesFloat32() option which decodes to float32. The Opus decoder supports this natively.

And a final question: is the byte slice returned by Frames() raw pcm? then it should be int16[] (the float debate notwithstanding).

mattetti · 2016-12-29T00:22:54Z

Update: with the help of @rakyll and @campoy I started working on a more generic audio bugger interface that could be used by codecs, analyzers, effects etc... The code isn't done but the idea would be to create a proposal around the minimal useful interface: https://github.com/go-audio/audio
I implemented wav and riff codecs under the same organization: https://github.com/go-audio

@hraban I'd love to hear your perspective on the audio.Buffer interface and if it would work for Opus. Ideally all audio libraries should be able to work nicely together by exchanging audio buffers.

bradfitz · 2016-12-29T00:24:29Z

The API mirrors the C API, in Go flavor

It's pretty Go-like, but still fairly C-like in places. Ditch the underscore variables, ditch the ALL_CAPS errors and APPLICATION_VOIP etc (https://github.com/golang/go/wiki/CodeReviewComments#mixed-caps), don't have exported variables of unexported type (opusError, opusFileError), use shorter variables names when it's obvious (func NewEncoder(...., application Application) could be app Application, etc).

hraban · 2016-12-29T15:07:15Z

@bradfitz thanks for the great feedback. I've made the changes to the API. I'll look at the private variables another time.

@mattetti I've been thinking about your api and I realised a mistake in my assumptions: PCM doesn't always have to be 16 bit. However, on the flip side, it almost always is. I'm generally very eager to have these codecs be as efficient as possible, because for my specific use case I was trying to handle as many as I could (mixing multiple audio streams, it was cpu bound). This is why I'm very keen on "hot potato" buffer handling, meaning that I wanted to allocate two circular buffers when a client connects (a float32[] for pcm and a byte[] for encoded data), and keep reusing that and passing it to the codec. xiph.org's libopus(file) API allows for this very nicely, and I was never doing any copying or manual converting. If you're going to be making a fundamental audio API for all Go programs to use, I'd definitely urge you to allow implementations to support float pcm and int (of any width) pcm without copying.

That said, I'll have to take a closer look another time, and actually try and implement it to see where it gets me. I only know Vorbis and Opus by xiph.org, and it's very clear they learned from their (numerous) mistakes in the former when designing the API for the latter.

mattetti · 2016-12-30T02:18:49Z

@hraban you are absolutely right that performance is important in a lot of cases, that's why I implemented 2 buffers: an int and a float buffer. You can easily implement a byte buffer and exchange the data between the 2. I used float64 instead of float32 mainly because there is little overhead and because for signal processing, might as well use float64. That said, we could add a float32 buffer if that made sense. The buffers are designed to be reused so we don't have to relocate them. I believe I provided a few examples with the PCM codecs (in most cases, we don't want to load an entire file in memory and prefer to read chunks at a time and dropping the PCM data in a reused buffer which can then be processed before being reused).

hraban · 2016-12-30T03:13:37Z

What I mean is you want to feed your buffers to the codec, and tell it to decode directly into them. A float64 buffer won't fly with libopus nor libvorbis†, because they use float32. That means you will be doing at least one conversion loop there (for i, f32 := range encodedBuf { goBuf[i] = float64(f32) }) for every audio stream you're handling. Two if you're encoding the data again to send back. Also, float64 will double everything in size which can be bad for your cache, and disregarding cache it will just make you wait on memory that much longer. Especially for 16-bit pcm (which is pretty common), with float32 you've got quite a lot of legroom already (the mantissa + sign bit alone are 24 bits which give you a factor 256).

Of course, libopus and libvorbis are just two libraries written in C, they're not gospel; once you get a pure golang implementation you can do what you want... I'm just not sure that even for a pure golang implementation it would be wise to go float64. I don't think you will ever need that precision, and it comes quite a steep cost: halving performance of everything capped by memory: cache, RAM bandwidth.

I'm coming at this from the perspective of a server-side app dealing with a lot of simultaneous audio streams. Think TeamSpeak. For this I can tell you: you want that data in and out asap. and every byte you can squeeze out of a session's footprint is room for another session to cram in. If you're having to convert between your codec's float32 array and an internal float64 array, that's immediately a cost you'd rather forego. Same for int vs int16.

For mobile, I have no idea. Never touched mobile dev in my life.

† can't mention libvorbis here without clarifying they don't actually support this *pcm buffer passing---they allocate a buffer internally which you have to copy out of before your next frame decode. Still, that's a memcpy, rather than a for loop with type conversion. And if you really wanted, you could have your libvorbis wrapper do the same, using unsafe. If you really really wanted. Hopefully not, but the point is it's nice to leave people the choice. When it comes to audio, every cycle counts.

mattetti · 2016-12-30T05:07:48Z

@hraban those are fair points, many DAWs and audio plugins do process the audio signal in float64 and Go std library is quite partial to this format. That said I agree that float32 is very common and I added it to the WIP proposal. That change required that I update the interface to:

type Buffer interface {
	// PCMFormat is the format of buffer (describing the buffer content/format).
	PCMFormat() *Format
	// NumFrames returns the number of frames contained in the buffer.
	NumFrames() int
	// AsFloatBuffer returns a float 64 buffer from this buffer.
	AsFloatBuffer() *FloatBuffer
	// AsFloat32Buffer returns a float 32 buffer from this buffer.
	AsFloat32Buffer() *Float32Buffer
	// AsIntBuffer returns an int buffer from this buffer.
	AsIntBuffer() *IntBuffer
	// Clone creates a clean clone that can be modified without
	// changing the source buffer.
	Clone() Buffer
}

Because if an API says it would take a generic audio.Buffer interface, it might need to convert it to float32. Now an API, like yours might only accept audio.Float32Buffer to avoid any lookup or potential conversion code. Here is the definition of the audio.Float32Buffer:

// Float32Buffer is an audio buffer with its PCM data formatted as float32.
type Float32Buffer struct {
	// Format is the representation of the underlying data format
	Format *Format
	// Data is the buffer PCM data as floats
	Data []float32
}

As you can see, the buffer is just a wrapper around a slice. The buffer has a few convenient methods to convert its data but it is designed so you can reuse it as often as you want. For instance you could allocate the float buffer of a certain size and keep decoding into it, send it over to a transform method and get the buffer back to then re-encode it.

My goal here is to define a common API that all audio libraries could use. @rakyll and I really quickly realized this was really hard because of the various use cases and concerns involved. But wouldn't be amazing if we could use the same interface and chain all those audio libs together?

I do believe this change should address your concern, but please let me know if that's not the case. Suggestions welcome too!

hraban · 2016-12-30T11:23:09Z

I like where this is going. There is certainly room for a unified audio codec api done right. There are many similarities, and if all you care about is handling some PCM data and just codec'ing at the edges, the specific codec used should be completely pluggable. I'll try to implement this interface in the opus lib, see if I run into anything else. Well, besides PCM int16 :) Cheers!

…

On Fri, Dec 30, 2016 at 5:09 AM, Matt Aimonetti ***@***.***> wrote: @hraban <https://github.com/hraban> those are fair points, many DAWs and audio plugins do process the audio signal in float64 and Go std library is quite partial to this format. That said I agree that float32 is very common and I added it to the WIP proposal. That change required that I update the interface to: type Buffer interface { // PCMFormat is the format of buffer (describing the buffer content/format). PCMFormat() *Format // NumFrames returns the number of frames contained in the buffer. NumFrames() int // AsFloatBuffer returns a float 64 buffer from this buffer. AsFloatBuffer() *FloatBuffer // AsFloat32Buffer returns a float 32 buffer from this buffer. AsFloat32Buffer() *Float32Buffer // AsIntBuffer returns an int buffer from this buffer. AsIntBuffer() *IntBuffer // Clone creates a clean clone that can be modified without // changing the source buffer. Clone() Buffer } Because if an API says it would take a generic audio.Buffer interface, it might need to convert it to float32. Now an API, like yours might only accept audio.Float32Buffer to avoid any lookup or potential conversion code. Here is the definition of the audio.Float32Buffer: // Float32Buffer is an audio buffer with its PCM data formatted as float32.type Float32Buffer struct { // Format is the representation of the underlying data format Format *Format // Data is the buffer PCM data as floats Data []float32 } As you can see, the buffer is just a wrapper around a slice. The buffer has a few convenient methods to convert its data but it is designed so you can reuse it as often as you want. For instance you could allocate the float buffer of a certain size and keep decoding into it, send it over to a transform method and get the buffer back to then re-encode it. My goal here is to define a common API that all audio libraries could use. @rakyll <https://github.com/rakyll> and I really quickly realized this was really hard because of the various use cases and concerns involved. But wouldn't be amazing if we could use the same interface and chain all those audio libs together? I do believe this change should address your concern, but please let me know if that's not the case. Suggestions welcome too! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13432 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIafLnOaWjMdyDaTHELkZPThjsawt-Dks5rNJHtgaJpZM4Grzt2> .

rakyll · 2017-01-04T19:36:33Z

Closing this proposal on the behalf of #18497, let's keep the conversation on the new proposal.

rakyll added the Proposal label Nov 30, 2015

rakyll self-assigned this Nov 30, 2015

gopherbot pushed a commit to golang/proposal that referenced this issue Dec 8, 2015

design: audio for mobile

3e77f0d

Updates golang/go#13432. Change-Id: I718006e8f039c476d456c1276c54132bd66d9410 Reviewed-on: https://go-review.googlesource.com/17262 Reviewed-by: Burcu Dogan <jbd@google.com>

rsc changed the title ~~proposal: mobile audio~~ proposal: x/mobile audio Dec 28, 2015

rsc added this to the Proposal milestone Dec 28, 2015

rakyll mentioned this issue Jul 15, 2016

proposal: Add an x/media package #16353

Open

mewmew mentioned this issue Aug 11, 2016

Seek support mewkiz/flac#16

Closed

adg mentioned this issue Sep 27, 2016

proposal: decide policy for sub-repositories #17244

Open

rsc added the Proposal-Hold label Dec 19, 2016

hraban mentioned this issue Dec 29, 2016

Brad Fitzpatrick feedback on naming hraban/opus#7

Merged

mattetti referenced this issue in go-audio/audio Dec 30, 2016

add float32 buffer

8503dc9

hraban mentioned this issue Dec 30, 2016

added 3 more setter / getter for Encoder Settings hraban/opus#8

Closed

mattetti mentioned this issue Jan 3, 2017

proposal: audio package #18497

Closed

rakyll closed this as completed Jan 4, 2017

golang locked and limited conversation to collaborators Jan 4, 2018

gopherbot added the FrozenDueToAge label Jan 4, 2018

rsc unassigned rakyll Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: x/mobile audio #13432

proposal: x/mobile audio #13432

rakyll commented Nov 30, 2015

gopherbot commented Nov 30, 2015

mattetti commented Dec 1, 2015

mattetti commented Dec 1, 2015

mattetti commented Dec 1, 2015

mattetti commented Dec 1, 2015

rakyll commented Dec 1, 2015

mattetti commented Dec 1, 2015

rakyll commented Dec 2, 2015

mewmew commented Dec 9, 2015

rakyll commented Feb 16, 2016

rakyll commented Jul 15, 2016

mewmew commented Aug 11, 2016

mattetti commented Aug 11, 2016

mewmew commented Aug 11, 2016 •

edited

Loading

rsc commented Dec 19, 2016

hraban commented Dec 29, 2016

mattetti commented Dec 29, 2016

bradfitz commented Dec 29, 2016

hraban commented Dec 29, 2016

mattetti commented Dec 30, 2016

hraban commented Dec 30, 2016 •

edited

Loading

mattetti commented Dec 30, 2016

hraban commented Dec 30, 2016 via email

rakyll commented Jan 4, 2017

proposal: x/mobile audio #13432

proposal: x/mobile audio #13432

Comments

rakyll commented Nov 30, 2015

gopherbot commented Nov 30, 2015

mattetti commented Dec 1, 2015

mattetti commented Dec 1, 2015

mattetti commented Dec 1, 2015

mattetti commented Dec 1, 2015

rakyll commented Dec 1, 2015

mattetti commented Dec 1, 2015

rakyll commented Dec 2, 2015

mewmew commented Dec 9, 2015

rakyll commented Feb 16, 2016

rakyll commented Jul 15, 2016

mewmew commented Aug 11, 2016

mattetti commented Aug 11, 2016

mewmew commented Aug 11, 2016 • edited Loading

rsc commented Dec 19, 2016

hraban commented Dec 29, 2016

mattetti commented Dec 29, 2016

bradfitz commented Dec 29, 2016

hraban commented Dec 29, 2016

mattetti commented Dec 30, 2016

hraban commented Dec 30, 2016 • edited Loading

mattetti commented Dec 30, 2016

hraban commented Dec 30, 2016 via email

rakyll commented Jan 4, 2017

mewmew commented Aug 11, 2016 •

edited

Loading

hraban commented Dec 30, 2016 •

edited

Loading