Encoder configuration

CoolOppo edited this page Apr 19, 2015 · 30 revisions

Selecting output mode

You have to pick one of the following output modes, that are mutually exclusive:

output mode option
AAC TVBR --tvbr, -V (default)
AAC CVBR --cvbr, -v
AAC ABR --abr, -a
AAC CBR --cbr, -c
ALAC --alac, -A
PCM(decode) --decode, -D
Playback --play
Peak scanning --peak

The last two are exceptional in a sense that they don't output to a file.

AAC output mode options take integer value (option argument) to specify bitrate or quality. --tvbr takes quality value between 0 to 127 (bigger result in better quality, larger file size). Others take bitrate in kbps. For example, -a 128 means ABR 128kbps. When you don't specify output mode at all, --tvbr 91 is chosen as default configuration.

For CBR, ABR, and CVBR, bitrate less than 8 is taken as bits per sample. Bitrate is calculated as the following:

bitrate = bits per sample * number of channels(excluding LFE) * sample rate

For example, --cvbr 1.5 is equivalent for --cvbr 144 in case of 48kHz, 2ch.

For AAC output mode, there exists an another option named ,--quality (-q), and you might be confused by this. Unlike bitrate or TVBR quality, this option controls not quality/size trade off but quality/speed trade off. Bigger value means higher quality (or smaller file size), and slower encoding speed. By default, qaac chooses --quality 2. Probably you don't have to touch this option unless you have a special reason to do so.

As for AAC output mode other than TVBR, you can additionally set --he to enable HE encoding mode.

Selecting output file format

m4a is the default output file format for AAC and ALAC output mode, and WAV is the dafault output file format for PCM output mode. You can additionally set --adts(for AAC) or --caf(for everything) to change output file format.

TVBR quality steps

Although TVBR option allows arbitrary value in 0-127 range, internally AAC codec has only 15 actually functional quality steps, therefore the value is get rounded to one of the following:

0 9 18 27 36 45 54 63 73 82 91 100 109 118 127

You can see this "actual quality value" written in the tool tag (in case of M4A output).

63 might be a good place to start with. Since it's positioned at mid center, you might take it as "mid quality". However, its typical bitrate for redbook format is around 128kbps or so, which is usually considered to be good for AAC (remember iTunes music store has been using 128kbps ABR, before iTunes plus has come out). Of course, noboby but you can decide which setting is appropriate for you. Decide by yourself using your ears.

Available bitrate

Available bitrate value for CVBR/ABR/CBR varies with number of channels, sample rate, and SBR option. You can see available combinations of channel layout, sample rate, and bitrate by the following command:

qaac --formats

0 as bitrate means "highest bitrate available". Therefore, --CVBR 0 is same as --CVBR 320 for 2ch, 44.1kHz input.

Relation to iTunes encoder setting

Just for your information, the following is the iTunes import setting, and it's equivalence for qaac (at 2011/10/11, iTunes 10.4.1).

iTunes setting equivalence for qaac
High Quality(128k) -a128 -q1
iTunes Plus(256k) -v256 -q2
Custom (VBR on) -v <bitrate> -q2
Custom (VBR off) -a <bitrate> -q2

As you can see, iTunes is only using either ABR or CVBR. If you want the same result as iTunes Plus, just use -v256 -q2 (-q2 is set by default, so actually -v256 is enough).

Encoder delay

Like other MDCT based lossy coders such as MP3, Vorbis or Opus, AAC has certain amount of encoder/decoder delay, and also padding at ending due to constantly sized frames. In other words, a certain amount of silence is prepended/appended to the beginning/ending of the output. qaac offers some options concerning the delay.

--num-priming

Specifies number of priming samples (delay) from 0 to 2112, where 2112 is the default amount of delay of Apple AAC codec. This option is only applicable to AAC LC. Smaller value means shorter delay.

1024 or greater should be safe. In many cases, it seems that you can go as low as 576 (=448 + 128, where 448 is the number of borrowed samples from the previous frame for short block case, and 128 is the size of short block) and still be able to achieve perfect gapless playback. However, considering long block case and also the fact that faad (CLI frontend) discards first 1024 samples, setting smaller value than 1024 cannot be said to be always safe.

When number of priming samples is X where X < 576, decoder should not be able to reconstruct first 576 - X samples at least. Therefore, you should avoid it unless that portion of input is known to be silent.

--gapless-mode

Specifies how to describe the amount of delay/padding in the M4A container.

0(default) iTunSMPB
1 ISO standard
2 Both

iTunSMPB is a special tag describing amount of delay and padding, used by iTunes, Nero, and FhG encoders.

ISO standard way instead uses MP4 boxes such as elst(Edit List), sbgp(Sample to Group), and sgpd(Sample Group Description).

Basically, non-standard iTunSMPB is common among music players such as foobar2000 or rockbox. However, since it is written as a file global tag, it is not suited for multiplexing into MP4 files containing multiple tracks.

Smart padding

Even with players capable of gapless playback, sometimes tiny glitches can be audible at the transition when cut-point is not digitally silent. From version 2.33, qaac will apply smarter padding by extrapolating beginning and ending of the input in order to minimize this possibility. The following images show how it works:

Input
Without smart padding
With smart padding

As shown in the image, this example input doesn't start with / end with zero. To the encoder, this looks as very sharp discontinuous transient (cliff), and can be problematic.

Second image (without smart padding) shows how it gets encoded if smart padding is disabled, and iTunes will encode like this. Silence is prepended/appended compared to the input due to encoder delay and padding.
How the result can be smoothly connected to the previous / next song depends on how well encoder could encode the cliffs at the beginning/ending. Generally, you have higher possibility of hearing glitches when you give less bitrate.

Third image shows the encoded result when smart padding is applied. As shown in the image, beginning/ending is extended (to be exact, "extrapolated") by linear prediction.
Amount of delay/padding remains the same, and the extrapolated portion should be trimmed out on the player side.

By default, qaac applies smart padding. As for SBR, as a workaround for a bug of CoreAudio SBR encoder, qaac appends one extra padding frame to the end. You can change this default behavior by setting --no-smart-padding to achieve bit-identical result as iTunes.