Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Playback issue with 24 bitsPerSample audio content #23

Closed
mehrvarz opened this issue Aug 15, 2018 · 22 comments
Closed

Playback issue with 24 bitsPerSample audio content #23

mehrvarz opened this issue Aug 15, 2018 · 22 comments

Comments

@mehrvarz
Copy link

Hi, unfortunately I am not able to play back 24/96 flac files. Such content usually gives me an "Output underflowed" error message or just trash noise. Here some free 24bps content:

http://www.lindberg.no/hires/test/2L-125_stereo-88k-24b_04.flac

When I call OpenDefaultStream(), do I have to hand over a buffer created with make([]portaudio.Int24,framesPerBuffer) instead of make([]int16,framesPerBuffer)? Anything else I should be aware of when trying to play back 24bps audio content? Thank you.

@gordonklaus
Copy link
Owner

Hi @mehrvarz . PortAudio works only with PCM data, not with encodings like FLAC or MP3. You'll need to find decoder library in order to convert your FLAC into PCM before passing it to PortAudio.

@mehrvarz
Copy link
Author

Thank you for responding. I can decode all my 16bit and 24bit flac files just fine. But so far I can only play back the decoded 16bit data via your library. I am unable to correctly push out the decoded 24bit data via portaudio. Is this supposed to work? Have you ever done this?

When I use VLC, I can play back my 24/96 flac files just fine via my 24/96 USB DAC (aka bitperfect). I would like to do the same from within my Go app. Do I need to use "portaudio.Int24" to create the buffer I habd over to OpenDefaultStream()? Is there anything else that is different from playing back 16bit PCM? Thank you.

@gordonklaus
Copy link
Owner

Yes, if you have 24-bit audio then you must use portaudio.Int24 for your buffer. I've never actually done it but I assume it's as straightforward as the other sample types. You will probably need to use the encoding/binary package to get your decoded audio into a []portaudio.Int24 buffer.

@mehrvarz
Copy link
Author

I am using encoding/binary for both 16bit and 24bit PCM. The code below works well for 16bit PCM. It does not for 24bit. (Take note: "if bytesPerSample>=3")

`

	frame, err := flacstream.ParseNext()
	if err != nil {
		if err == io.EOF {
			log.Debug("PlayFlac EOF")
		} else {
			log.Warning("PlayFlac error flacstream.ParseNext() "+err.Error())
		}
		break
	}

	audio := make([]byte, 8192*4)
	j := 0
	for i := 0; i < int(frame.BlockSize); i++ {
		for _, subframe := range frame.Subframes {
			sample := subframe.Samples[i]
			if bytesPerSample>=3 {
				audio[j] = uint8((sample>>16)&0xff)
				j++
			}
			audio[j] = uint8((sample>>8)&0xff)
			j++
			audio[j] = uint8(sample&0xff)
			j++
		}
	}

	err = binary.Read(bytes.NewBuffer(audio[0:j]), binary.BigEndian, out)
	if err != nil {
		// "unexpected EOF" = not enough audio-data for out
		break
	}

	err = stream.Write()
	if err != nil {
		// "Output underflowed"
		break
	}

`
No matter what I try, with 24bit content I either get "unexpected EOF" from binary.Read(), or "Output underflowed" from stream.Write(), or scrambled audio noise.

Maybe I am handling 16/24 bps correctly and my problem is caused by the 96khz sampleRate? Do I have to cope with different sampleRates in different way?

The problem could also be caused by how I use framesPerBuffer. I don't fully understand how to set this up correctly. 8192 words works well for 44.1khz/16bit content. Do I need to use a different value for (decoded) 24/96 content?

"I've never actually done it" - Wish you didn't say this :-)

@gordonklaus
Copy link
Owner

I wish I could be of more help. It would be easier if you shared a complete "working" piece of code.

Is the endianness correct?

Since you're extracting the bytes manually from the samples already, it seems unnecessary to also use encoding/binary. Maybe the source of error is in there and simplifying with remove it.

I don't know why it would make a difference, but you could try passing non-interleaved frames to portaudio, i.e. [][]portaudio.Int24.

Unexpected EOF from binary.Read I would only expect if j is not a multiple of 3. The code you posted doesn't seem to have that problem.

The docs indicate that output underflow may be caused by the stream callback taking too much (or in your case, I suppose, the time before you call stream.Write). If you decode the audio first and then open stream afterward, does this error go away?

Scrambled audio noise suggests wrong endianness or incorrectly interleaved samples. Try a mono file?

I don't think 96kHz is the problem, the code should work pretty independently from that (unless it is too CPU intensive). Same goes for framesPerBuffer. You could try increasing it but 8192 seems like plenty.

@gordonklaus
Copy link
Owner

@mehrvarz Another thought: Try re-encoding the decoded data and playing in VLC just to be sure the FLAC library is working properly.

@mehrvarz
Copy link
Author

You may be right. Take a look at this: mewkiz/flac#30
Please let us put this thread on hold until I resolved my flac decoding issue. Thank you very much.

@mehrvarz
Copy link
Author

mehrvarz commented Aug 16, 2018

"output underflow may be caused by the stream callback taking too much (or in your case, I suppose, the time before you call stream.Write)"

I doubt it. This code does not even generate 1% cpu load. It is not a performance issue. More likely I am getting "underflow" because I am not handing over enough data before stream.Write(). I'm still mystified by the number of bytes/words/Int24's I have to hand over. And whether the bits needs to be packed (as in 4 x 24 = 3 x int32) or not. A portaudio expert would surely know this? (edited typo)

@gordonklaus
Copy link
Owner

I wouldn't call myself a PortAudio expert but neither is it a complicated API and I'll do my best to help.

The documentation for Buffer should make it clear what you need to do. You shouldn't need to do any special packing because a []portaudio.Int24 is already byte-packed (i.e. unsafe.Sizeof([N]portaudio.Int24) == 3*N).

To isolate whether the sample type is the issue, try using a different sample type (e.g., int32 or float32). To do the sample conversion manually, multiply the int32s you get from your FLAC decoder by 1<<8 (for int32 output) or divide by 1<<24 (for float32).

I can probably give you better help if you share your code.

@mehrvarz
Copy link
Author

mehrvarz commented Aug 17, 2018

Using int32 and shifting the decoded data left bound (<<8 for 24bit content and <<16 for 16bit content) brought me a major step forward. Thank you very much for this very good idea. One other thing that solved a major issues for me (with certain flac files containing unusual header block sizes), was to treat the 4th argument of OpenDefaultStream() not as framesPerBuffer, but as the actual byte size of the output buffer, which is being handed over as 5th argument. So instead of framesPerBuffer = (frame.BlockSize * 2), I am now handing over the actual byteSizeOfOutputBuffer = (frame.BlockSize * 2 * bytesPerSample) via the 4th argument. At the same time it appears to be absolutely crucial for outputBuffer to have the exact size of framesPerBuffer. (Edit: ...as number of elements not bytes!)

In other words: frames-Per-Buffer is being handed over, but only implicitly by allocating the exact size for outputBuffer. And the way you have named the 4th argument is not fully correct. I could be wrong but please take a look and lmk what you think. Thx!

OK, you may find this interesting: I ported my flac player over to ARM and have it running on my RPi-2 now. Here my cpu load numbers (I don't think 24/96 playback will work on the zero, but I don't have one to test with):

  audio source 16/44:  playflac 13-23%  pulseaudio 1-2%   combined top: 25%
  audio source 24/96:  playflac 32-50%  pulseaudio 6-9%   combined top: 59%

That 24/96 is sounds excellent, btw.

@gordonklaus
Copy link
Owner

gordonklaus commented Aug 17, 2018

I'm glad to hear that your code is working now.

I wrote a small example to check if Int24 is working properly and it seems so. One problem in the original snippet you shared was the byte order (not in the call to binary.Read but in your manual byte extraction); it should have been little endian. I will update the documentation to make this clear.

Regarding framesPerBuffer, I'm quite sure the extra multiplier you're seeing is not bytesPerSample but something else like numChannels. In my example code if I pass framesPerBuffer larger than the actual buffer length (in samples, not bytes) I get an "Output underflowed" error.

I'll close this issue with the documentation fix. If you can provide a complete working example (not just a snippet) suggesting a framesPerBuffer problem, please provide it in a separate issue.

Here is the example I wrote to test Int24:

package main

import (
	"fmt"
	"math"
	"os"
	"os/signal"

	"github.com/gordonklaus/portaudio"
)

func main() {
	fmt.Println("Playing.  Press Ctrl-C to stop.")

	sig := make(chan os.Signal, 1)
	signal.Notify(sig, os.Interrupt, os.Kill)

	portaudio.Initialize()
	defer portaudio.Terminate()
	out := make([]portaudio.Int24, 8192)
	sampleRate := 44100.0
	stream, err := portaudio.OpenDefaultStream(0, 1, sampleRate, len(out), &out)
	chk(err)
	defer stream.Close()

	chk(stream.Start())
	defer stream.Stop()
	phase := 0.0
	step := 256 / sampleRate
	for {
		for i := range out {
			x := int32(math.MaxInt32*math.Sin(2*math.Pi*phase))
			_, phase = math.Modf(phase + step)

			out[i].PutInt32(x)
		}
		chk(stream.Write())
		select {
		case <-sig:
			return
		default:
		}
	}
}

func chk(err error) {
	if err != nil {
		panic(err)
	}
}

@gordonklaus
Copy link
Owner

Fixed in e7b6b93

@gordonklaus
Copy link
Owner

@mehrvarz It turns out that PortAudio expects Int24s to be encoded with native byte ordering and thus the code I provided above was not portable. So I added the PutInt32 method in 00e7307 and updated the example above.

@mehrvarz
Copy link
Author

mehrvarz commented Aug 18, 2018

I didn't know that type Int24 [3]byte accepts PutInt32(x). Amazing. The least significant byte is cut off?

I changed sampleRate to 96000.0 and it continues to work. This is something I don't fully understand. Where do the bits come from? This also makes me wonder: if you don't have a 24bit capable sound card and this code does work for you, then how do we know that 24bit audio is actually being delivered to the sound card? It would be more convincing if it would only sound well on a 24bit capable sound card and would fail otherwise.

I think it could very well be that your code only delivers plain 16bit audio content always. My code is using int32 and it definitely does not generate 32bit audio. I am actually speculating that some component is cutting off the lower 8 bits. Where exactly do we mention "dear pulseaudio, please treat this as 24bit encoded audio during playback" like we explicitly mention the desired sampleRate?

On Linux, when you enter cat /proc/asound/card1/stream0 (or card0 or card2...) you should see your "Momentary freq" and "Format" (16 or 24bit). What do you get while "sinus" is playing? What do you see if you use cvlc to play back the flac I referenced in my first post? (edited typos)

@mehrvarz
Copy link
Author

update 1: I found your PutInt32() implementation. I first thought this is a Go internal (or stdlib) thing.

update 2: I found reflect.TypeOf(Int24{}) in sampleFormat(). Looks like the type of elements in the buffer is significant (my early assumption). So I updated my code to now use two different buffer arrays ([]int16 and []Int24) depending on the bitsPerSample I get from the decoder. This is how I shuffle the data now in 24 bit mode. Samples[] is of type int32.

	for i := 0; i < int(frame.BlockSize); i++ {
		outbuf24[j].PutInt32(frame.Subframes[0].Samples[i] << 8); j++
		outbuf24[j].PutInt32(frame.Subframes[1].Samples[i] << 8); j++
	}

There is an entity in the audio system that can upscale or downscale the bps in the live data - if necessary. But if the content is 24 bit and the sound card is too, no conversion should take place. I think I got this now for all cases. It is good that you implemented PutInt32() and provided sample code. Thank you a lot. We can close this now :-)

@mehrvarz
Copy link
Author

Hello, I have one more request. I hope you don't mind.

Further up I mentioned my CPU load on the Raspberry:

audio source 24/96: playflac 32-50% pulseaudio 6-9% combined top: 59%

Since I switched to Int24 and PutInt32() my load has jumped up:

audio source 24/96: playflac 32-63% pulseaudio 6-9% combined top: 72%

I assume this is due to the heavy bit shifting. If you would be so kind and add an alternative PutInt32 fkt, one that would expect the 24 bit payload in the lower 3 bytes, I could eliminate one >> shift per item. Also on your side one of three >> shifts would be eliminated. Something like this:

func (i24 *Int24) PutInt32alt(i32 int32) {
	if littleEndian {
		i24[0] = byte(i32)
		i24[1] = byte(i32 >> 8)
		i24[2] = byte(i32 >> 16)
	} else {
		i24[2] = byte(i32)
		i24[1] = byte(i32 >> 8)
		i24[0] = byte(i32 >> 16)
	}
}

(My naming of this fkt is not a serious proposal.)

With load up to 72% (for 24/96 content only) any additional CPU activity taking place in parallel is making "Output underflowed" messages very likely. Eliminating the bit-shifting is an obvious thing to consider. It is a shame the data cannot be handed over and processes without any bit shifting.

@gordonklaus
Copy link
Owner

@mehrvarz Respectfully, I do mind.

We don't do performance optimization based on speculation.

Please do more groundwork before posting to issues.

@mehrvarz
Copy link
Author

Facts:

96000 samples per second over two channels = 192K calls to PutInt32() per second.

4 bit-shift operations per call = 768K per second = 46M per minute - over the previous implementation.

+13% CPU load on average on a stock RPi2. Same for all content.

@gordonklaus
Copy link
Owner

What I meant by "groundwork" was: You have the code, you have the hardware, you have the assumption; now test your assumption. Implement it and see if it really makes a difference.

As for your reasoning: A constant shift should take 1 cycle. On a 1GHz CPU, 1M shifts per second equates to 0.1% utilization. Your 13% increase is most likely coming from somewhere else.

But don't take anyone's word for it. Measure.

@mehrvarz
Copy link
Author

mehrvarz commented Aug 20, 2018

cpu load of 24/96 audio playback on RPi2 (without native portaudio)

playflac int32          top 50%
playflac PutInt32()     top 63%  +13%
playflac PutInt32alt()  top 58%   +8%

These are the top values from looking at "top" for about a minute. Average load is like 5-7% lower. But the top value is crucial because it brings me so close to my friend "Output underflowed".

58 is better than 63 but more should be possible. For once: I could go back to int32. I still don't know what the exact consequences would be. Do you?

What if the two channels from the decoder were already interleaved? Are you familiar with the binary.Read padding fields? "When reading into structs, the field data for fields with blank ( ) field names is skipped; i.e., blank field names may be used for padding." What if Int24 was an array of 32bit structs with a padding byte? Could allow me (speculating now) to send the whole frame in one Go.

Help me a little.

@gordonklaus
Copy link
Owner

I imagine int32 would work just as well as Int24 in terms of audio quality. I'll admit I'm mystified by the difference in CPU usage. To figure out what exactly PortAudio is doing in these situations you'd either have to ask an expert or look at the source.

As for padding, that won't work. When you specify Int24, PortAudio expects the buffer to be packed with 24-bit samples, no padding. Using a struct with padding would defeat the purpose of defining type Int24 [3]byte.

@mehrvarz
Copy link
Author

I imagine int32 would work just as well as Int24 in terms of audio quality.

Probably true.

I'll admit I'm mystified by the difference in CPU usage.

I switched back to int32 and measured again. While the values are mostly between 47-50%, I sometimes see a 54 pop up. Either I am doing something wrong now, or I did last time. Maybe I just didn't wait long enough.

play 24/96 flac int32          top 54%
play 24/96 flac PutInt32()     top 63%
play 24/96 flac PutInt32alt()  top 58%
play 24/96 flac cvlc           top 54%

I also measured cvlc. It shows practically identical cpu load values. It is also prone to the very same underrun issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants