# Basics
+ An analog sinusoid is always periodic. However, when a sinusoid is sampled, the resulting discrete-time sinusoid is only periodic if the frequency $F$ of the analog sinusoid and the sampling frequency $F_S$ are such that, $\frac{F}{F_S} = \frac{M}{N}$, where $M$ and $N$ are integers. 
    + In such a case, the resulting signal is always periodic with a period of N samples:
    $$
    \begin{align}
    sin\left( 2\pi\dfrac{M}{N}(n+N) \right) &= sin\left( 2\pi\dfrac{M}{N}n +  2\pi Mn \right) \\
                                            &= sin\left( 2\pi\dfrac{M}{N}n \right)
    \end{align}
    $$
    
    + While the resulting signal is always periodic with a period of N cycles, it can be periodic by an integer fraction of N cycles as well. For ex., if $F/F_S = 1/2$ then, $M/N$ could be conceived as $1/2$ or $2/4$ or $4/8$ and so on. So if one choose $N$ to be, says, 4, then the signal will repeat itself every 4 samples, but it will also repeat itself every 2 samples. 
    + The practical utility of the parent bullet and sub-bullets above aren't of much practical utility.
    
+ A analog sinusoid of frequency $F$, after sampling at the rate of $F_S$ will be inditinguisheable from the sampled version of an analog sinusoid of frequency $F+kF_S$. This is because,

$$
\begin{align}
sin\left( 2\pi\dfrac{F+kF_S}{F_S}n \right) &= sin\left( 2\pi\dfrac{F}{F_S}n + 2\pi kn\right) \\
                                           &= sin\left( 2\pi\dfrac{F}{F_S}n \right)
\end{align}
$$
    + This point is useful when we think about DTFT (or DTFS or DFT): Due to this indistinguisheability, the spectrum repeats itself over and over at intervals of $F_S$
    
+ Sine-in, sine-out property:
+ Real life sinusoids aren't eternal - They start at time=0s. When they go into an LTI system, after sometime, it is as if the sinusoid has existed forever. So the output will simply be a scaled version of that sinusoid, with the scale determined by the systems frequency reponse value at that frequency. However until that *sometime*, along with this scaled version of the sinusoid, a transient signal will also be seen at the output. This is the **transient resonse** of the system (due to the poles of the system) whereas the scaled sinusoid is the **steady-state response** (due to the poles of the input signal). Since, except for the trainsient response which dies off sooner or later, the output is just a scaled version of the input, non-eternal sinusoids can be thought of as "almost eigen functions" of an LTI system. It turns out that this is true for any non-eternal geometric progression signal of the form $q^nu[n]$ is an "almost eigen function", where $q$ is any complex number. Sinusoids are just a special case of this, where $q = e^{j\omega}$. More information can be found in [3.6.3 of CH3 of Fessler EECS451 notes](https://web.eecs.umich.edu/~fessler/course/451/l/pdf/c3.pdf). 


# DFT
+ The spectrum of a discrete-time aperiodic signal is continuous in F (DTFT) and repeats itself for every multiple of $\pm 2\pi$. DFT samples this spectrum at $2\pi \dfrac{k}{N}$ points, where $ k= 0, 1, 2, \ldots, N-1$.
+ To accurately reconstruct a signal from its DFT, we need $N$ greater than or equal to the number of time-domain samples.
+ Every frequency represented by DFT (sampled by DFT) will have its negative-equivalent also sampled by DFT. 
    + *Proof:*
    $$
    \begin{align}
    2\pi\dfrac{k}{N} &= -2\pi + 2\pi\dfrac{k}{N} \\
                     &= -2\pi \dfrac{N-k}{N} \text{, which means} \\
                   k &= -(N-k)             
    \end{align}
    $$
    
    For all $k > N/2$, we will have $N-k < N/2$ and present in the DFT (except when $k=0$). This is universally true, whether $N$ is even or odd.
    + Example 1: $N=8, F_s=8KHz$. This means, DFT will sample the frequencies \[0KHz, 1KHz, 2KHz, 3KHz, 4KHz, 5KHz, 6KHz, 7KHz\]. One can rewrite this with negative frequencies for $k > N/2$ using $k = -(N-k)$ as, \[0KHz, 1KHz, 2KHz, 3KHz, 4KHz, -(8-5)KHz, -(8-6)KHz, -(8-7)KHz\], which is same as \[0KHz, 1KHz, 2KHz, 3KHz, 4KHz, -3KHz, -2KHz, -1KHz\]
    + Example 2: $N=9, F_s = 9KHz$. Again, our DFT will sample, \[0KHz, 1KHz, 2KHz, 3KHz, 4KHz, -(9-5)KHz, -(9-6)KHz, -(9-7)KHz, -(9-8)KHz\], i.e., \[0KHz, 1KHz, 2KHz, 3KHz, 4KHz, -4KHz, -3KHz, -2KHz, -1KHz\]
    + Example 3: $N=8, F_s = 9KHz$. DFT will sample, \[0,  1.125KHz,  2.250KHz,  3.375KHz, 4.5KHz, -(9-5.625)KHz, -(9-6.75)KHz, -(9-7.825)KHz\], which is, \[0,  1.125KHz,  2.250KHz,  3.375KHz, 4.5KHz, -3.375KHz, -2.250KHz, -1.125KHz\]

+ When $N$ is an even number, $N/2$ will be an integer and hence be sampled by DFT. This corresponds to $\omega=\pi$. When $N$ is an odd number $N/2$ won't be an integer and hence won't be sampled by the DFT. 
+ To reconstruct a real valued signal from its DFT, we only need the positive half of the spectrum as the negative half will be just a complex conjugate of the positive half. 
    + For even $N$, this means we need $k=0,1,\ldots.N/2$, i.e., $N/2 + 1$ values. This is because $k=N/2$ is its own negative frequency. 
    + But when $N$ is odd, we need $k = 0,1, \ldots, truncate(N/2)$, ie., truncate(N/2) + 1 frequencies. 
    + In general, we can simply need the first truncate(N/2)+1 DFT coefficients, as truncate(N/2) is anyway equal to N/2 when N is even.
    + Example, when $N=8$, we need the first 5 DFT samples. When $N=9$, also, we need the first 5 DFT samples. 
+ One cannot calculate the fourier transform of a noise signal as it isn't absolutely integrable. A noise signal cannot be theoretically plotted as we don't know what the sample value will be - we only know the probability of different possible values. But if the noise is wide-sense stationary, then wiener khinchin theorem says that the fourier transform of the autocorrelation function $R_{X}(\tau)$ will give the "Power Spectral Density", $S_{X}(f)$ of the random process $X$. Note that PSD isn't the same as spectrum. Just like probability density function, one can integrate PSD over a band $[f_1, f_2]$ to find signal power in that band. Here are couple of useful links: [Link 1](https://homepages.wmich.edu/~bazuinb/ECE3800SW/SW_Notes09.pdf), [Link2](https://www.probabilitycourse.com/chapter10/10_2_1_power_spectral_density.php)
    + When a random process is WSS, and also uncorrelated, then its autocorrelation is $R_X(\tau) = \sigma^2 \delta(\tau)$. So the PSD is flat over all frequencies, with amplitude equal to $\sigma^2$. This is basically white noise.



    
# Audio
## General
+ Audible spectrum range is 20Hz to 20KHz. But the ear's channel response isn't flat in that band, nor is it monotonically increasing: Its sensitivity increases with frequency up to a point and then displays some bizarre behaviour as shown below (called Fletcher-Munson curves):

<img src="equal_loudness.png" width=400/>

+ Human speech lies in 125Hz to 8KHz range
+ The dynamic range of the ear is from 0dB (corresponding to 20mPa pressure on the diaphragm) to 120 dB (averaged across the bandwidth?). Noise above 90dB may cause hearing impairment

## Measurements
+ Decibel level in audio is noise level in reference to human hearing threshold, which is 20micropascal pressure on a diaphragm (So 0dB is human hearing threshold). It is measured using a sound meter in which the diaphragm converts pressure to electricity and decibel level is obtained as 

$$
\text{Decibel level} = 10log_{10}\left(\dfrac{noise\ power}{20mPa\ power}\right)
$$

+ **Typical noise decibel levels**:

<img src="decibel_chart.jpg" width=400/>

So, compared to light rain (40 dB), a normal conversation (50 dB) is 10 times louder (powerful or energy/sec ful) and a noisy restaurant is 1000 times louder!

+ dBFS - decibel relative to full scale is a measurement that is defined only for digital signals (and not analog signals). 0 dBFS is the maximum possible digital level; A signal that is half the range is -6dBFS and so on. The SQNR [formula for a quantizer](#Quantizer-noise) represents the minimum noise floor in dBFS for a system (with neg sign) that deals with an $n$-bit signal. For ex., for a 16-bit signal, noise floor is at least $-(6.02*16+1.761) \approx -98dB$.

+ THD - Total harmonic distortion. When a sinusoid passes through an LTI system, the output is always be a sinusoid of the same frequency as that of the input, but possibly with a different amplitude. However, if a sinusoid passes through a non_LTI system, we cannot predict what what the output would look like. However if system is approximately LTI then it make sense to think that the output of a sinusoidal input will most contain the input's fundamental frequencies with the system imperfections introducing extra harmonics. THD (fundamental) for a given sinusoid is the ratio of total power in harmonics to that of the fundamental calculated using RMS levels of the fundamental and the harmonics as::

$$
THD_F = \dfrac{\sqrt{V^2_2 + V^2_3 + V^2_4 + \ldots}}{V_1}
$$

$THD_F$ is typically represented in dB $20log_{10}$ of the above quantity). Measure of THD aren't standard: How many harmonics to include in the formula itself is a question. Besides that, it isn't meaningful to use a single sinusoid's $THD_F$ as a measure of a system's non-linearity - Different companies follow different methods while measuring THD of their equipment (like microphone, headphones etc.). Some use a single referece frequency, some use a few evenly sampled across the bandwidth and so on. 

+ THDN - Total harmonic distortion plus noise. It is measured by inputing a sinusoid and using a notch filter at the output of the system for which THDN is measured to remove the fundamental - The RMS of the output is compared to the RMS of the input. 

$$
THDN = \dfrac{RMS output}{V_1}
$$

As with THD, unless accompanied by more information such as the fundamental frequencies used, the bandwidth of the output's measurement etc., THDN alone won't make much sense

+ THD and THDN are both applicable to analog as well as digital systems. In a discrete-time LTI filter, there won't be any THD/THDN. But since digital filters are only nominally LTI due to finite register length effects, they will introduce harmonic distortions and quantization noise

## Quantization
### Why increase sampling frequency by 4 times is same as having 1 extra bit in quantizer
Quantization (A/D conversion) at a sample rate Fs and a certain number
of bits N produces a total noise power Pn that is a function of N but not
a function of Fs. So, for any given N, you get Pn watts of noise power
no matter if the sample rate is 1 sample per second or 1E9 samples per
second. 

The next thing to understand is that the quantization noise power Pn
is spread evenly across all frequencies in the digital signal. So if
you're sampling at Fs samples per second, the noise will be Pn/(Fs/2)
watts per Hertz. 

For example, if Pn is 1 watt and Fs is 2000 samples/second. Then there
will be that Pn/(Fs/2) = 1 / 1000 = 0.001 watt/Hz of noise power in
the frequency domain. So in a 1 Hz bandwidth you'd have 1 milliwatt;
in a 10 Hz bandwidth you'd have 10 milliwatts; in the total bandwidth
of 1000 Hz you'd have 1 watt.

So, now we get to the point of your question. Let's say your total
signal power in the bandwidth of interest is Ps, the total noise power
due to quantization is Pn, and the signal bandwidth is B Hz. If you
sample at 2*B samples/second, then the amount of quantization noise
power Pnb in the signal bandwidth B is

  Pnb = B * (Pn / (Fs/2)) 
      = B * (Pn / B)
      = Pn

and the SNR is then

  SNR = Ps / Pnb
      = Ps / Pn

So there's nothing you can do to improve the SNR. 

However, if you oversample by some amount, say, Fs = M*2*B samples
per second, where M > 1, where B is the bandwidth of interest,
then the amount of quantization noise power Pnb in the signal bandwidth B is

  Pnb = B * (Pn / (Fs/2)) 
      = B * (Pn / M*B)
      = Pn / M

and the SNR is then

  SNR = Ps / Pnb
      = M * Ps / Pn.

Thus you have improved your SNR by a factor of M. So, if M = 2, you
improve the SNR by 3 dB; if M = 4, you improve the SNR by 6 dB (or
1 bit); etc.

So, to answer your questions note the following:

  1. what is important is the SNR in the bandwidth of interest. 
  2. oversampling improves the SNR in the bandwidth of interest
  3. decimation (assuming you prefilter with the appropriate
  lowpass filter) does not destroy the SNR improvement in the
  bandwidth of interest.

### Quantizer noise
Quantizer noise is basically the rounding error (or truncation error) resulting from the very operation of quantizing. Basically a quantizer can be modelled as a noise source that adds to the input signal to form the output signal. Quantization noise is correlated with the signal. Different types of signals produces different types of quantization noise. SQNR - Signal to Quantization Noise Ratio derived for one type of signal is, in general, not applicable to a different type of signal. But there are some facts about quantization noise that will be the same for all types of input signals as listed below
 + Rounding error can never be greater than $\pm 1/2$ LSB. So if the quantizer noise is a random variable, its distribution, irrespecitve of whatever it may be, will always be bound. In other words, the noise cannot have a Gaussian distribution stretching from $-\infty$ to $\infty$
 + Quantizer noise isn't frequency dependent
 + For a given signal, as quantizer number of bits increases, the quantization noise will decrease. 


**When input signal is uniformaly distributed RV**: Suppose the signal is uniformly distributed from 0 volts to Vmax volts. Then the signal power, i.e., the variance, will be $\dfrac{\left(V_{max}-0\right)^2}{12}$. The quantizer noise will also be uniformly distributed between $-\dfrac{1}{2} \dfrac{V_{max}}{2^n}$ volts and $\dfrac{1}{2} \dfrac{V_{max}}{2^n}$ volts, i.e, $Noise \sim U\left( -\dfrac{V_{max}}{2^{n+1}}, \dfrac{V_{max}}{2^{n+1}} \right)$, where $n$ is the number of bits in the quantizer. Noise power, i.e., its variance, is hence, $\dfrac{\left(V_{max}/2^n\right)^2}{12}$. So the SQNR is:

$$
\begin{align}
SQNR &= 10log_{10}\left(2^n)\right)^2 \\
     &= 20log_{10} 2^n \\
     &= n*20log_{10} 2 \\
     &\approx n\ *\ 6.02\ dB
\end{align}

Note: Quantizer noise power is spread evenly (if the noise is uncorrelated) throughout the spectrum. So one way to reduce quantization noise level relative to signal is to use a large multiple of the nyquist rate (a.k.a OSR, Over-sampling rate) and then low pass filtering the band of interest. More about this is discussed in the noise shaping section.

**When input signal is a sine wave**: THe quantization noise for a sine wave looks like as shown below. When the number of bits $n$ is large, within any quantization voltage interval, the sine wave can be considered approximately linear. In this case, we will end up with more or less a saw tooth wave as shown below. 

<img src="sine_quantized.gif" width=640/>
<img src="sawtooth.png" width= 640/>

The RMS value of a sine wave is $A/\sqrt2$, where the sine wave is swinging from $-A$ to $A$. In our case, if the sine wave fits the full range of the quantizer, then $A = q*\dfrac{1}{2} 2^n$, where $q$ is the voltage level of 1 LSB. So the signal RMS value is $\dfrac{q*2^n}{2\sqrt{2}}$.

For the sawtooth noise, one can find the RMS value as:

$$
\begin{align}
RMS\ noise &= \sqrt{\dfrac{1}{T} \int_{-T/2}^{T/2} \left(\dfrac{qt}{T}\right)^2 dt} \\
           &= \sqrt{\dfrac{q^2}{T^3} \int_{-T/2}^{T/2} t^2 dt} \\
           &= \sqrt{\dfrac{q^2}{T^3} \left( \dfrac{\dfrac{T^3}{8}+\dfrac{T^3}{8}}{3} \right)} \\
           &= \dfrac{q}{\sqrt{12}}
\end{align}
$$

where we used the equation of a straight line inside the integral. 

The SQNR can be calculated as:

$$
\begin{align}
SQNR &= 20log_{10} \left( \dfrac{\dfrac{q*2^n}{2\sqrt{2}}}{\dfrac{q}{\sqrt{12}}} \right) \\
     &= 20log_{10} \left( 2^n \dfrac{\sqrt{12}}{\sqrt{8}} \right) \\
     &= 20log_{10} 2^n + 20log_{10} \left( \sqrt{\dfrac{3}{2}} \right) \\
     &\approx \left( 6.02n + 1.761 \right)\ dB
\end{align}
$$

##
### Dither
When an analog signal is quantized, or when a digital signal is requantized to a lower resolution, the quantizer error is correlated with the signal and human ear perceives correlated noise as more irritating that even a higher level uncorrelated error. Dither added to the signal *before* quantization results in an uncorrelated noise that is more acceptable. One way to think about dither is that, if you round a value 4.8, it always results in 5. However, if we add a uniform noise (-0.5, 0.4), the signal level 4.8 gets converted into a uniform R.V. varying between 4.3 and 5.3. This means that, 20% of times (when after dither it is 4.3, 4.4), the original value of 4.8 will be quantized to 4, and 80% of times (when after dither it is 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.1, 5.2) to 5. If we average 20% of 4s and 80% of 5s, we get 4.8. So if the input signal is a sine wave, if one were to take the quantizer output and superimpose several periods, and average the result, it will look like a smooth sine wave, instead of a quantized step-like sine wave. The averaging happens inside the human ear. 

The concept of dither applies to signals of all dimensions. In fact, it is easier to see its use when it comes to images. For example, there is a grayscale image that needs to be requantized to just two values - white and black - then  regular quantization will result in large washed out areas and large areas moved to black as shown below:

<figure>
    <img src="dither_original.png" width=400>
    <figcaption> Original image </figcaption>
</figure>

<figure>
    <img src="dither_threshold.png" width=400>
    <figcaption> Quantization without dither </figcaption>
</figure>

But if dither is added, after quantization, we still get an image that has only two levels, but the pixel densities of white and black pixels in areas of the image are such that, when human eye averages them, the result looks somewhat like a grayscale image.


<figure>
    <img src="dither_dither.png" width=400>
    <figcaption> Quantization with dither </figcaption>
</figure>


# ADC
+ Two types: Nyquist rate converters (Ex. SAR- Successive Approximation register), Oversampling converters (OSR) (Ex. Delta-sigma modulator)
+ In nyquist rate converters, the analog signal is filtered through an (analog) anti-aliasing filter, a sample and hold circuit and then a multi-level quantizer. The problem is, the anti-aliasing filter needs to have a sharp rolloff and is often implemented with non-linear phase response. The non-linear phase distortion degrades quality of the output. Also the multi-level quantizer needs to have several reference voltages (as many as the no. of levels) which is complex and hard to fabricate
+ We have seen that, in the case of the quantizer noise being a uniform random variable, the noise RMS is $\dfrac{q}{\sqrt{12}}$. That makes the power to be $\dfrac{q^2}{12}$. Since the power is spread out evenly over the entire band, the power spectral density is flat (white) with an amplitude of be $\dfrac{q^2}{12f_s}$, where $f_s$ is the sampling frequency. When we use oversampling, there is a straight-forward reduction of noise power by OSR times in the band of interest. So if the OSR is 4, then noise power reduces by 4.
+ The other advantage of using an OSR is that, unlike in the case of the nyquist rate converters, the (analog) anti-aliasing filter need not have a sharp-rolloff. This is because, as long as the roll-off is such that, when sampled at $f_s*OSR$, the aliasing stops before the edge of the passband . Page 24 of [this doc](https://web.archive.org/web/20060621221221/http://digitalsignallabs.com/SigmaDelta.pdf) has some good illustrations, reproduced here for convenience (The entire doc is in drive->Mathematics->Statistics and Signal Processing folder):

<img src=anti-aliasing_Nyquist_OSR.png width=800/>

+ The signal is then subjected to a digital anti-aliasing filter before decimated to $f_s$. This process essentially shifts stringent filter design requirements from analog domain, where it is difficult, to digital domain where it is relatively easier. 

# Noise shaping
When quantizing from analog to digital or from a higher resolution digital to lower resolution digital, we can add dither and use OSR technique to get an acceptable noise floor in the signal band. Noise shaping is an additional technique that can be used to shape the quantization noise such that it is lower in the lower frequencies at the expense of higher noise at the upper frequencies. 

## Basic idea
Suppose we want to quantize the signal $x[n]$, we can characterize the output $y[n]$ as:

$$
y[n] = x[n] + q[n]
$$

where, $q[n]$ is the quantizer noise. The corresponding Z-transform looks like:

$$
Y(Z) = X(Z) + Q(Z)
$$

But instead of this, we want to subject the noise to a filtering operation and make the Z-transform to be:

$$
Y(Z) = X(Z) + W(Z)Q(Z)
$$

where, W(Z) is the noise shaping filter's transfer function (Typically the inverse of Fletcher-Munson [equal loudness contour](https://en.wikipedia.org/wiki/Equal-loudness_contour) is used). 

We can massage the above equation as:

$$
\begin{align}
\dfrac{Y(Z)}{W(Z)} &= \dfrac{X(Z)}{W(Z)} + Q(Z) \\
Y(Z) \left( 1+\dfrac{1}{W(Z)}-1 \right) &= \dfrac{X(Z)}{W(Z)} + Q(Z) \\
Y(Z) &= \dfrac{X(Z)}{W(Z)} + \left( 1-\dfrac{1}{W(Z)} \right)Y(Z) + Q(Z)
\end{align}
$$

The implementation will look like this:
<img src=noise_shaping1.png width=800 />

Alternatively, one can massage the original equation as:

$$
\begin{align}
\dfrac{Y(Z)}{W(Z)} &= \dfrac{X(Z)}{W(Z)} + Q(Z) \\
Y(Z) \left( 1+\dfrac{1}{W(Z)}-1 \right) &= X(Z) \left( 1+\dfrac{1}{W(Z)}-1 \right) + Q(Z) \\
Y(Z) &= X(Z) + \left( \dfrac{1}{W(Z)}-1 \right) \left( X(Z)-Y(Z) \right) + Q(Z)
\end{align}
$$

This implementation looks like this:
<img src=noise_shaping2.png width=900 />

The second implementation is more efficient as it uses only one filter. Since earlier this filter was in the loop, it is called the "loop filter". The simplest loop filter is an integrator. Thus in that case, a difference operator comes first followed by an integrator. It is hence called a **"Delta-Sigma converter**. If the quantizer is a 1-bit quantizer, then the noise-shaper above becomes a PCM to PDM converter!

A few points of importance:
  + Noise cannot be suppressed over the entire band or even a large band as described [here](https://wiki.hydrogenaud.io/index.php?title=Noise_shaping) and [here](https://www.dsprelated.com/showarticle/184.php). Apparently there is something called rate-distortion and limitation is proven by Gerzon-Craven noise-shaping theorem. There is also some other advanced shit in the second link. 
  + To make noise shaping meaningful, one must use dither - otherwise the quantization noise levels won't be uniform throughout the band and hence it would be unpredictable what the noise shaping will achieve. Without dither, the quantization noise will depend heavily on the signal and hence its spectrum unpredictable. So it will look something like this (don't worry about other blocks):

<img src=noise_shaping_dither.png width=800 />
  
  + Typically the input is over-sampled at some OSR, and then (we get $x[n]$ above, and) the whole process of noise shaping as described above is done. After the noise shaping (i.e., after we get $y[n]$), an anti-aliasing (digital) filter is applied and decimation is carried out. In the case of a PCM-to-PDM converter, the sigma-delta modulation is done at the transmitter and the resulting binary (analog) signal is sent out. At the receiver, a simple LPF (even an RC filter with poor rolloff) can be used because of the OSR and the resulting analog signal can directly be fed to a speaker or to an ADC to convert back to PCM. PWM is closely related to PDM (probably exactly the same...need to check).
      + The idea of combining OSR with dither makes easier sense if we think in terms of images: First resample images with higher number of pixels (a rectangular group of pixes will how have the same value as a single pixel in the original image), then add dither and quantize (Check github dsp/tutorial/dither_image.py 
 + Sometimes, to explicitly show that the noise shaping process is causal, the filter is divided into a $z^{-1}$ (delay element) and the remaining is shown as a transfer function $H(Z)$. 


# Filtering
## FIR vs. IIR filter
  + An ideal LPF has an impulse response of 
  $$
  h[n] = \frac{\omega_c}{\pi}sinc\left( \dfrac{wc}{\pi}n \right)\text{, where, } sinc(x) = \dfrac{sin(\pi x)}{\pi x}
  $$
      + This impulse response is not causal and is an infinite impulse response. So there is no FIR filter implementation for it. Furthermore, its $H(z)$ isn't rational, which means we cannot even implement it as an IIR filter with finite number of additions/multiplications and a feedback loop.
      + If we want a causal FIR implementation, we can do this by multiplying the ideal impulse response with a window function that has values from $n=0$ to $n=w$, where $w$ is the width of that window
      + 
  + FIR filter can be designed with linear phase response, while IIR can't. But it will require to have a much higher number of taps in order to get the same sharp roll-off as an equivalent IIR filter. This means the latency (group-delay) suffered by signals going through an FIR would be much higher than the latency suffered going through an IIR
  + Magnitude and phase of an FIR filter can be manipulated independently, whereas this is not possible with IIR filters
  + One can simply sample desired frequency response and build an FIR filter. This is not possible for an IIR filter

## General
  + [Apparently](https://www.earlevel.com/main/2003/02/28/biquads/) DF1 biquad is better for fixed point systems and transposed DF2 is better for floating point
  + Limit cycles are related to IIR outputing some oscillations forever even without any input due to quantization related issues w.r.to coefficients and outputs of different stages of the IIR fitler

# Fixed-point processing
  + When multiplying two signed numbers, Qm.n and Qa.b, where 'Q' indicates the sign bit, the result will be QQma.nb for all possibile combinations of the two numbers except when they are both the smallest negative number, that is 0x8000... In this case, the result will be Q(ma+1).nb. In other words, one can't just assume that there will be a redundant sign bit and get right of it by shifting up (like is the case in some signal processors)
      + For example, 0x8000 in Q15 means -1. All other numbers represented by Q15 have their absolute values smaller than 1. So multiplying them will result in a number whose absolute value is also less than 1. But on the rare occasion when both numbers are -1, the result is +1. If we think the result will be QQ30, we will be wrong for just this one case. 
      + In some processors, multipliers recognise this rare occurrence and simply throw out the most positive number possible in QQ30 as the result i.e., 0X3FFF FFFF. And then they usually shift it up to Q31 format, ending up with 0x7FFF FFFF. Some TI processors do this
      + In some other case, processors extend the operands by 1 bit before multiplication
  + Most often, multipliers are full-res. In other words, if the operands are 16x16, the result is a 32 bit number. It may be followed by a shifter before it goes into an accumulator. So one could multiply two 16 bit numbers, get a 32-bit number, shift it down by 16 bits to get a 16-bit number and then accumulate this number. It all depends on how big the accumulator is, how much precision is needed, how many multiplies are going to be accumulated before the accumulator is cleared. 
  + Fixed point is simply floating point with the a fixed, implicit exponent. And this exponent doesn't necessarily have to be related to the number of fractional bits. For example, When we represent a number by Q15, it doesn't necessarily mean the original number (theoretical number) was multiplied by 2^15 and then rounded to give an integer number. In fact, the exponent may be a power of 10 instead of power of 2. But of course, power of 2 is convenient (because one can shift up/down numbers instead of multipliying/dividing with a power of 2 number) and is standardised 
      
## References
  1. [Microchip processor with 17x17 multiplicaiton](https://microchipdeveloper.com/dsp0201:multiplier)
  2. [TI processor with 17x17 multiplicaiton](https://www.ti.com/lit/ug/spru307a/spru307a.pdf?ts=1636107999152&ref_url=https%253A%252F%252Fwww.google.com%252F)
  3. 