In [None]:
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display

## Zero-Crossing Rate:
*   **Definition**: The Zero-Crossing Rate is the number of times the waveform crosses the zero amplitude per frame.
*   **Mathematicl Expression**:<br>
    \begin{align}
        ZCR = \frac{1}{N} \sum_{n=1}^{N} |\text{sgn}(x[n]) - \text{sgn}(x[n-1])|
    \end{align}
Where:
  * $N$ is the total number of samples in the frame.
  * $x[n]$ is the amplitude of the signal at sample $n$.
  * $\text{sgn}(x)$ is the sign function, defined as:
    \begin{align}
    \
    \text{sgn}(x) =
    \begin{cases}
    +1 & \text{if } x > 0, \\
    0 & \text{if } x = 0, \\
    -1 & \text{if } x < 0.
    \end{cases}
    \
    \end{align}

  * Essentially, we are counting the number of times the signal changes from positive to negative or vice versa.











In [None]:
%ls audio

By default, Librosa resamples all loaded audio files to a standard sample rate of 22050 Hz unless you specify otherwise.
Let's load a signal:

In [None]:
x, sr = librosa.load('audio/simple_loop.wav')

In [None]:
print(sr)

Listen to the signal:

In [None]:
ipd.Audio(x, rate=sr)

In [None]:
duration = librosa.get_duration(y=x, sr=sr)
print(f"Audio duration: {duration:} seconds")

Plot the signal:

In [None]:
plt.figure(figsize=(14, 5))
librosa.display.waveshow(x, sr=sr)
plt.grid(True)

Let's zoom in:

In [None]:
n0 = 6500
n1 = 7500
plt.figure(figsize=(14, 5))
plt.plot(x[n0:n1])
plt.grid(True)
plt.axhline(y=0, color='red', linestyle='--', linewidth=2.5)

I count five zero crossings. Let's compute the zero crossings using librosa.

In [None]:
zero_crossings = librosa.zero_crossings(x[n0:n1], pad=False)

In [None]:
zero_crossings.shape

That computed a binary mask where `True` indicates the presence of a zero crossing. To find the total number of zero crossings, use `sum`:

In [None]:
print(sum(zero_crossings))

To find the *zero-crossing rate* over time, use `zero_crossing_rate`:

parameters:

*   `frame_length` (int, default=2048):
    * The length of each analysis frame in samples.
    * This defines the number of samples used to compute the ZCR for a single frame. Larger values provide a more smoothed result but reduce temporal resolution, while smaller values increase sensitivity to local changes in the signal.
    * Example: With a sample rate of 22050Hz, a frame_length of 2048 corresponds to about $\frac{2048}{22050} \approx 0.093$ seconds per frame.

*   `hop_length` (int, default=512) :
    * The number of samples between successive frames.
    * Smaller `hop_length` values create more overlapping frames, increasing temporal resolution, but also increase computational cost. Conversely, larger values reduce overlap, lowering temporal resolution.
    * Example: With `hop_length` = 512 and `sample_rate` = 22050, the hop duration is $\frac{512}{22050} \approx 0.023$ seconds per hop.
*   `center` (bool, default=True):
    * Determines whether the signal is padded so that frames are centered around their midpoint.
    * If `center = True`:
      * The signal is padded by $\frac{frame_length}{2}$ zeros at the start and end.
      * This ensures that the first frame is centered on the start of the signal and the last frame convers the end of the signal.
    * If `center = False`:
      * The frames are taken directly from the signal without padding.
      * The first frame starts at the beginning of the signal.



In [None]:
print(duration)
print(duration * 22050)
total_num_sample = duration * 22050

print(total_num_sample / 512)

In [None]:
# x_2 = x[:2*sr]

zcrs = librosa.feature.zero_crossing_rate(x, frame_length=2048, hop_length=512, center=True)
print(zcrs.shape)
# plt.figure(figsize=(14, 5))
# plt.plot(zcrs[0])
# plt.grid(True)

Plot the zero-crossing rate:

In [None]:
zcrs_1 = librosa.feature.zero_crossing_rate(x, frame_length=2048, hop_length=512, center=True)
plt.figure(figsize=(14, 5))
plt.plot(zcrs_1[0])
plt.grid(True)

In [None]:
plt.figure(figsize=(14, 5))
librosa.display.waveshow(x, sr=sr)
plt.grid(True)

Note how the high zero-crossing rate corresponds to the presence of the snare drum.

### Does Zero-Crossing Rate Correspond to the Presence of a Snare Drum?
Zero crosing rate measure how frequently the signal changes (cross 0 amplitude level) over time. This metric is particularly useful for distinguishing between different types of sounds:



1.   Percussive Sounds & High Zero Crossing Rate:

  *   The snare drum typically produce high-frequency content with many rapid fluctuations in the waveform. This results in a high number of zero crossing, make Zero Crossing a useful indicator of percussive elements (like snares) in music.


2.   Sustained & Harmonic Sounds & Low Zero Crossing Rate:


  *   Instruments like sustained strings, vocal, and wind instruments tend to have smoother waveforms with fewer zero crossings. These signals do not fluctuate as frequently.





The reason for the high rate near the beginning is because the silence oscillates quietly around zero:

In [None]:
plt.figure(figsize=(14, 5))
plt.plot(x[:1000])
plt.ylim(-0.0001, 0.0001)

A simple hack around this is to add a small constant before computing the zero crossing rate:

In [None]:
zcrs = librosa.feature.zero_crossing_rate(x + 0.0001)
plt.figure(figsize=(14, 5))
plt.plot(zcrs[0])

## Questions

Try for other audio files. Does the zero-crossing rate still return something useful in polyphonic mixtures?

In [None]:
%ls audio

## RMS Energy:
*   **Definition**: RMS energy measures the average power of a signal over a time window.
*   **Mathematicl Expression**:<br>
    \begin{align}
      \text{RMS} = \sqrt{ \frac{1}{N} \sum_{n=0}^{N-1} x^2[n] }
    \end{align}

    

Where:
  * $x_{i}$ are the audio sample values.
  * $N$ is the number of samples in a frame.

  * It is computed from the raw waveform and does not require a spectral representation.

In [None]:
# RMS (Root Mean Square) enery
rms = librosa.feature.rms(y=x, frame_length=512, hop_length=512)
print(rms.shape)

In [None]:
plt.figure(figsize=(14, 5))
plt.plot(rms[0])
plt.grid(True)

plt.figure(figsize=(14,5))
librosa.display.waveshow(y=x, sr=sr)
plt.grid(True)

## What Does RMS Energy Represent?

*   The bottom plot is the waveform of the audio signal, showing raw amplitude variations over time.
*   The top plot represents RMS energy, a smoother representation of loudness over time.
*   Loudness & Dynamics: Higher RMS values correspond to louder sounds, while lower values indicate quieter regions.

### Why Do They Match?

