# Audio
## General
+ Audible spectrum range is 20Hz to 20KHz. But the ear's channel response isn't flat in that band, nor is it monotonically increasing: Its sensitivity increases with frequency up to a point and then displays some bizarre behaviour as shown below:

<img src="equal_loudness.png" width=400/>

+ Human speech lies in 125Hz to 8KHz range
+ The dynamic range of the ear is from 0dB (corresponding to 20mPa pressure on the diaphragm) to 120 dB (averaged across the bandwidth?). Noise above 90dB may cause hearing impairment

## Measurements
+ Decibel level in audio is noise level in reference to human hearing threshold, which is 20micropascal pressure on a diaphragm (So 0dB is human hearing threshold). It is measured using a sound meter in which the diaphragm converts pressure to electricity and decibel level is obtained as 

$$
\text{Decibel level} = 10log_{10}\left(\dfrac{noise\ power}{20mPa\ power}\right)
$$

+ **Typical noise decibel levels**:

<img src="decibel_chart.jpg" width=400/>

So, compared to light rain (40 dB), a normal conversation (50 dB) is 10 times louder (powerful or energy/sec ful) and a noisy restaurant is 10,000 times louder!

+ dBFS - decibel relative to full scale is a measurement that is defined only for digital signals (and not analog signals). 0 dBFS is the maximum possible digital level; A signal that is half the range is -6dBFS and so on. The SQNR [formula for a quantizer](#Quantizer-noise) represents the minimum noise floor in dBFS for a system (with neg sign) that deals with an $n$-bit signal. For ex., for a 16-bit signal, noise floor is at least $-(6.02*16+1.761) \approx -98dB$.

+ THD - Total harmonic distortion. When a sinusoid passes through an LTI system, the output is always be a sinusoid of the same frequency as that of the input, but possibly with a different amplitude. However, if a sinusoid passes through a non_LTI system, we cannot predict what what the output would look like. However if system is approximately LTI then it make sense to think that the output of a sinusoidal input will most contain the input's fundamental frequencies with the system imperfections introducing extra harmonics. THD (fundamental) for a given sinusoid is the ratio of total power in harmonics to that of the fundamental calculated using RMS levels of the fundamental and the harmonics as::

$$
THD_F = \dfrac{\sqrt{V^2_2 + V^2_3 + V^2_4 + \ldots}}{V_1}
$$

$THD_F$ is typically represented in dB $20log_{10}$ of the above quantity). Measure of THD aren't standard: How many harmonics to include in the formula itself is a question. Besides that, it isn't meaningful to use a single sinusoid's $THD_F$ as a measure of a system's non-linearity - Different companies follow different methods while measuring THD of their equipment (like microphone, headphones etc.). Some use a single referece frequency, some use a few evenly sampled across the bandwidth and so on. 

+ THDN - Total harmonic distortion plust noise. It is measured by inputing a sinusoid and using a notch filter at the output of the system for which THDN is measured to remove the fundamental - The RMS of the output is compared to the RMS of the input. 

$$
THDN = \dfrac{RMS output}{V_1}
$$

As with THD, unless accompanied by more information such as the fundamental frequencies used, the bandwidth of the output's measurement etc., THDN alone won't make much sense

+ THD and THDN are both applicable to analog as well as digital systems. In a discrete-time LTI filter, there won't be any THD/THDN. But since digital filters are only nominally LTI due to finite register length effects, they will introduce harmonic distortions and quantization noise

## Quantization
### Why increase sampling frequency by 4 times is same as having 1 extra bit in quantizer
Quantization (A/D conversion) at a sample rate Fs and a certain number
of bits N produces a total noise power Pn that is a function of N but not
a function of Fs. So, for any given N, you get Pn watts of noise power
no matter if the sample rate is 1 sample per second or 1E9 samples per
second. 

The next thing to understand is that the quantization noise power Pn
is spread evenly across all frequencies in the digital signal. So if
you're sampling at Fs samples per second, the noise will be Pn/(Fs/2)
watts per Hertz. 

For example, if Pn is 1 watt and Fs is 2000 samples/second. Then there
will be that Pn/(Fs/2) = 1 / 1000 = 0.001 watt/Hz of noise power in
the frequency domain. So in a 1 Hz bandwidth you'd have 1 milliwatt;
in a 10 Hz bandwidth you'd have 10 milliwatts; in the total bandwidth
of 1000 Hz you'd have 1 watt.

So, now we get to the point of your question. Let's say your total
signal power in the bandwidth of interest is Ps, the total noise power
due to quantization is Pn, and the signal bandwidth is B Hz. If you
sample at 2*B samples/second, then the amount of quantization noise
power Pnb in the signal bandwidth B is

  Pnb = B * (Pn / (Fs/2)) 
      = B * (Pn / B)
      = Pn

and the SNR is then

  SNR = Ps / Pnb
      = Ps / Pn

So there's nothing you can do to improve the SNR. 

However, if you oversample by some amount, say, Fs = M*2*B samples
per second, where M > 1, where B is the bandwidth of interest,
then the amount of quantization noise power Pnb in the signal bandwidth B is

  Pnb = B * (Pn / (Fs/2)) 
      = B * (Pn / M*B)
      = Pn / M

and the SNR is then

  SNR = Ps / Pnb
      = M * Ps / Pn.

Thus you have improved your SNR by a factor of M. So, if M = 2, you
improve the SNR by 3 dB; if M = 4, you improve the SNR by 6 dB (or
1 bit); etc.

So, to answer your questions note the following:

  1. what is important is the SNR in the bandwidth of interest. 
  2. oversampling improves the SNR in the bandwidth of interest
  3. decimation (assuming you prefilter with the appropriate
  lowpass filter) does not destroy the SNR improvement in the
  bandwidth of interest.

### Quantizer noise
Quantizer noise is basically the rounding error (or truncation error) resulting from the very operation of quantizing. Basically a quantizer can be modelled as a noise source that adds to the input signal to form the output signal. Quantization noise is correlated with the signal. Different types of signals produces different types of quantization noise. SQNR - Signal to Quantization Noise Ratio derived for one type of signal is, in general, not applicable to a different type of signal. But there are some facts about quantization noise that will be the same for all types of input signals as listed below
 + Rounding error can never be greater than $\pm 1/2$ LSB. So if the quantizer noise is a random variable, its distribution, irrespecitve of whatever it may be, will always be bound. In other words, the noise cannot have a Gaussian distribution stretching from $-\infty$ to $\infty$
 + Quantizer noise isn't frequency dependent
 + For a given signal, as quantizer number of bits increases, the quantization noise will decrease. 


**When input signal is uniformaly distributed RV**: Suppose the signal is uniformly distributed from 0 volts to Vmax volts. Then the signal power, i.e., the variance, will be $\dfrac{\left(V_{max}-0\right)^2}{12}$. The quantizer noise will also be uniformly distributed between $-\dfrac{1}{2} \dfrac{V_{max}}{2^n}$ volts and $\dfrac{1}{2} \dfrac{V_{max}}{2^n}$ volts, i.e, $Noise \sim U\left( -\dfrac{V_{max}}{2^{n+1}}, \dfrac{V_{max}}{2^{n+1}} \right)$, where $n$ is the number of bits in the quantizer. Noise power, i.e., its variance, is hence, $\dfrac{\left(V_{max}/2^n\right)^2}{12}$. So the SQNR is:

$$
\begin{align}
SQNR &= 10log_{10}\left(2^n)\right)^2 \\
     &= 20log_{10} 2^n \\
     &= n*20log_{10} 2 \\
     &\approx n\ *\ 6.02\ dB
\end{align}
$$

**When input signal is a sine wave**: THe quantization noise for a sine wave looks like as shown below. When the number of bits $n$ is large, within any quantization voltage interval, the sine wave can be considered approximately linear. In this case, we will end up with more or less a saw tooth wave as shown below. 

<img src="sine_quantized.gif" width=640/>
<img src="sawtooth.png" width= 640/>

The RMS value of a sine wave is $A/\sqrt2$, where the sine wave is swinging from $-A$ to $A$. In our case, if the sine wave fits the full range of the quantizer, then $A = q*\dfrac{1}{2} 2^n$, where $q$ is the voltage level of 1 LSB. So the signal RMS value is $\dfrac{q*2^n}{2\sqrt{2}}$.

For the sawtooth noise, one can find the RMS value as:

$$
\begin{align}
RMS\ noise &= \sqrt{\dfrac{1}{T} \int_{-T/2}^{T/2} \left(\dfrac{qt}{T}\right)^2 dt} \\
           &= \sqrt{\dfrac{q^2}{T^3} \int_{-T/2}^{T/2} t^2 dt} \\
           &= \sqrt{\dfrac{q^2}{T^3} \left( \dfrac{\dfrac{T^3}{8}+\dfrac{T^3}{8}}{3} \right)} \\
           &= \dfrac{q}{\sqrt{12}}
\end{align}
$$

where we used the equation of a straight line inside the integral. 

The SQNR can be calculated as:

$$
\begin{align}
SQNR &= 20log_{10} \left( \dfrac{\dfrac{q*2^n}{2\sqrt{2}}}{\dfrac{q}{\sqrt{12}}} \right) \\
     &= 20log_{10} \left( 2^n \dfrac{\sqrt{12}}{\sqrt{8}} \right) \\
     &= 20log_{10} 2^n + 20log_{10} \left( \sqrt{\dfrac{3}{2}} \right) \\
     &\approx \left( 6.02n + 1.761 \right)\ dB
\end{align}
$$

### Dither
When an analog signal is quantized, or when a digital signal is requantized to a lower resolution, the quantizer error is correlated with the signal and human ear perceives correlated noise as more irritating that even a higher level uncorrelated error. Dither added to the signal *before* quantization results in an uncorrelated noise that is more acceptable. One way to think about dither is that, if you round a value 4.8, it always results in 5. However, if we add a uniform noise (-0.5, 0.4), the signal level 4.8 gets converted into a uniform R.V. varying between 4.3 and 5.3. This means that, 20% of times (when after dither it is 4.3, 4.4), the original value of 4.8 will be quantized to 4, and 80% of times (when after dither it is 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.1, 5.2) to 5. If we average 20% of 4s and 80% of 5s, we get 4.8. So if the input signal is a sine wave, if one were to take the quantizer output and superimpose several periods, and average the result, it will look like a smooth sine wave, instead of a quantized step-like sine wave. The averaging happens inside the human ear. 

The concept of dither applies to signals of all dimensions. In fact, it is easier to see its use when it comes to images. For example, there is a grayscale image that needs to be requantized to just two values - white and black - then  regular quantization will result in large washed out areas and large areas moved to black as shown below:

<figure>
    <img src="dither_original.png" width=400>
    <figcaption> Original image </figcaption>
</figure>

<figure>
    <img src="dither_threshold.png" width=400>
    <figcaption> Quantization without dither </figcaption>
</figure>

But if dither is added, after quantization, we still get an image that has only two levels, but the pixel densities of white and black pixels in areas of the image are such that, when human eye averages them, the result looks somewhat like a grayscale image.


<figure>
    <img src="dither_dither.png" width=400>
    <figcaption> Quantization with dither </figcaption>
</figure>
