Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiter applied? #20

Closed
sclsj opened this issue Oct 15, 2022 · 18 comments
Closed

Limiter applied? #20

sclsj opened this issue Oct 15, 2022 · 18 comments

Comments

@sclsj
Copy link

sclsj commented Oct 15, 2022

Still the sample audio in the last issue, but I noticed that the first two channels are heavily limited and sounds much more obvious than the stereo mix provided by the artist. Is this limiter applied by the producer or the decoder?

@SakethSathuvalli
Copy link
Member

Hi @sclsj,

The decoded output depends on the configurations enabled at the encoder end. The decoder as such does not do any kind of post processing apart from what the specification suggests. So in your case if you are observing that the first two channels are heavily limited then in all probability it is the way the stream is encoded (intended by the creator of the stream).

Request you to close the issue if this answers your doubt!

Thanks!

@sclsj
Copy link
Author

sclsj commented Nov 27, 2022

Thank you!

The reason I'm asking is that it appears that if I select 2ch rather than 7.1ch as output, the first two channels are more limited/compressed, although I could be wrong about that.

I will close this issue when I'm able to extract the individual objects to see if they are compressed as well.

@sclsj

This comment was marked as outdated.

@SakethSathuvalli
Copy link
Member

Hi,

If the speaker layout of the bitstream file is different from that of the speaker layout requested from command line using -cicp: option, then the decoder converts the output to the requested speaker layout using rendering algorithm suitable for the file.

From the pictures you shared it looks like the result of rendering gives an impression that the signal is compressed. As informed earlier, the decoder does not specifically apply limiter apart from the end of chain processing descried in the specification.

Thanks!

@sclsj
Copy link
Author

sclsj commented Jan 2, 2023

That made sense. However, you did not answer my question. What can I do to not let this happen?

@SakethSathuvalli
Copy link
Member

Can You let us know the information on the number of channels / objects in this stream and the speaker lay-out of the bit stream?

Also can you please explain this "Stereo Mix provided by the artist" part ? Can you elaborate on what this information is ?

@sclsj

This comment was marked as outdated.

@sclsj

This comment was marked as outdated.

@SakethSathuvalli
Copy link
Member

Hi @sclsj,

Can You please let us know if you see the "limiting effect" when you he command line option -cicp: is not used ??

@sclsj

This comment was marked as outdated.

@sclsj
Copy link
Author

sclsj commented Jan 2, 2023

And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.

@SakethSathuvalli
Copy link
Member

For the "sample audio in the last issue" one- yes. Still quite obvious (especially in the first two channel)

截屏2023-01-02 13 48 08

For the other example, not really. (Well, to be honest it's still obvious in channel 3. That one does not change regardless of cicp configuration, suggesting that quite some objects are concentrated in that coordinate/direction/speaker. (Or just that the vocal are quite loud / high gain compared to everything else)

截屏2023-01-02 13 50 33

The two pictures here correspond to that of different audio streams or the same stream decoded with different options - can you please elaborate ?

@SakethSathuvalli
Copy link
Member

And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.

We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the -ext_ren: flag.

@sclsj
Copy link
Author

sclsj commented Jan 2, 2023

Two different ones. First one is 群青, second one is Essence. Sorry for not making that clear.

@sclsj
Copy link
Author

sclsj commented Jan 2, 2023

And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.

We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the -ext_ren: flag.

I saw this in the GSG docx. I tried -ext_ren:1 flag, and I got _ext_ren_pcm.raw and _ext_ren_oam_md.bs in the executable folder (agrees with command line description but conflicts with the documentation). I find the section referred in the documentation, but still as some doubt:

17.10.6 Audio PCM data
The PCM data of the channels and objects interfaces shall be provided through the decoder PCM buffer, which first contains the regular rendered PCM signals (e.g. 12 signals for a 7.1+4 setup). Subsequently nchan, out additional signals carry the PCM data of the originally transmitted channel representation. These are followed by nobj, out signals carrying the PCM data of the un-rendered output objects. Then additional signals carry the nHOA, out HOA data which number is indicated in the HOA metadata interface via the HOA order (e.g. 16 signals for HOA order 3). The HOA audio data in the HOA output interface is provided in the so-called equivalent spatial domain representation. The conversion from the HOA domain into the equivalent spatial domain representation and vice versa is described in Annex C.5.1.
The decoder shall signal the offset index of the PCM buffer for the first un-rendered output object and the offset index of the PCM buffer for the first HOA audio signal.

Well, that gives us 12 + 12 + 10 = 34 channels. Assuming 16-bit and 48000Hz, that would result in a 747 mb file, but I got a 357 mb file.

When I try to decode it, I also get (mostly) garbage channels (channels with random noise). Not sure what I'm doing wrong here. I'm using: ffmpeg -f s16le -ar 48k -ac 15 -i /Users/jin/Desktop/libmpegh/_ext_ren_pcm.raw /Users/jin/Desktop/libmpegh/_ext_ren_pcm.wav. The 15 channel count comes from a rough estimate based on file size. I also tried other ones, ranging from 2 to 35 channels, but either all the channels are noise or most of the channels are noise.

Is there a flag I can use for the tool to output a wav instead of a raw pcm?

Also, if I read the specification right, according to 17.10.3 objects are still processed (DRC, gain, and peak limiter) before they are exported. Can I disable that?

@SakethSathuvalli
Copy link
Member

And speaking of such, I really want to see support for outputting in (decoding to) object-oriented format such as ADM BWF. It makes more sense to call this as decoding and mixing down those objects for a specific channel layout as rendering.

We dont have currently support for ADM BWF. However, its possible to have individual decoded objects using the -ext_ren: flag.

I saw this in the GSG docx. I tried -ext_ren:1 flag, and I got _ext_ren_pcm.raw and _ext_ren_oam_md.bs in the executable folder (agrees with command line description but conflicts with the documentation). I find the section referred in the documentation, but still as some doubt:

17.10.6 Audio PCM data
The PCM data of the channels and objects interfaces shall be provided through the decoder PCM buffer, which first contains the regular rendered PCM signals (e.g. 12 signals for a 7.1+4 setup). Subsequently nchan, out additional signals carry the PCM data of the originally transmitted channel representation. These are followed by nobj, out signals carrying the PCM data of the un-rendered output objects. Then additional signals carry the nHOA, out HOA data which number is indicated in the HOA metadata interface via the HOA order (e.g. 16 signals for HOA order 3). The HOA audio data in the HOA output interface is provided in the so-called equivalent spatial domain representation. The conversion from the HOA domain into the equivalent spatial domain representation and vice versa is described in Annex C.5.1.
The decoder shall signal the offset index of the PCM buffer for the first un-rendered output object and the offset index of the PCM buffer for the first HOA audio signal.

Well, that gives us 12 + 12 + 10 = 34 channels. Assuming 16-bit and 48000Hz, that would result in a 747 mb file, but I got a 357 mb file.

When I try to decode it, I also get (mostly) garbage channels (channels with random noise). Not sure what I'm doing wrong here. I'm using: ffmpeg -f s16le -ar 48k -ac 15 -i /Users/jin/Desktop/libmpegh/_ext_ren_pcm.raw /Users/jin/Desktop/libmpegh/_ext_ren_pcm.wav. The 15 channel count comes from a rough estimate based on file size. I also tried other ones, ranging from 2 to 35 channels, but either all the channels are noise or most of the channels are noise.

Is there a flag I can use for the tool to output a wav instead of a raw pcm?

Also, if I read the specification right, according to 17.10.3 objects are still processed (DRC, gain, and peak limiter) before they are exported. Can I disable that?

Hi @sclsj

Can You please refer to our wiki page on external rendering interfaces ?

Thanks!

@SakethSathuvalli
Copy link
Member

Hi @sclsj,

Can You please close this issue if this is similar to what is been discussed #19

Thanks!

@sclsj
Copy link
Author

sclsj commented Feb 9, 2023

Yes, it’s kind of the same thing. I’m having some other related issues but I need to investigate further before posting them.

@sclsj sclsj closed this as completed Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants