In [6]:
import numpy as np
import librosa
import matplotlib.pyplot as plt

# PRE-EMPHASIS

Pre-emphasis is an initial stage in the Mel Frequency Cepstral Coefficients (MFCC) extraction process to improve the quality of the sound signal before extracting its features. Pre-emphasis is done by applying a high-pass filter to amplify the high frequency components of the audio signal.

$$y(n)=x(n)-\alpha.x(n-1)$$

- $n$ is the input signal.
- $y(n)$ is the signal after pre-emphasis.
- $\alpha$ is the *pre-emphasis* coefficient which is in the range of 0 to 1. The commonly used value of *α* is about 0.95.

In [2]:
def pre_emphasis__(signal, coefficient=0.97):
  return np.append(signal[0], signal[1:] - coefficient * signal[:-1])

# FRAME BLOCKING

In the Frame Blocking process, the speech signal is split into many small chunks called frames, with each frame overlapping each other. This process is designed to minimize the loss of important information (deleted) or disconnected pieces of signal during frame division. This operation continues until the entire audio signal is thoroughly mapped into frames. By dividing the signal into frames, the information contained in it can be represented in a more detailed and specific manner, making it easier for sound processing algorithms to process. In addition, frame blocking also plays an important role in overcoming variations in the duration of the sound signal, making the feature extraction process more consistent and reliable for various purposes, such as speech recognition or audio analysis.

$$frame = \frac {I-N} {M} + 1$$

Description:
- $I$ is the value of *sampling rates.*
- $N$ indicates the *size* of *frame blocking.*
- $M$ is the length of *overlap.*

In [None]:
def framing__(signal, sr, frame_size=1, frame_stride=0.5):
  frame_length, frame_step = int(round(frame_size * sr)), int(round(frame_stride * sr))
  signal_length = len(signal)
  num_frames = int(np.ceil(float(np.abs(signal_length - frame_length)) / frame_step) + 1)
  
  pad_signal_length = (num_frames - 1) * frame_step + frame_length
  z = np.zeros((pad_signal_length - signal_length))
  pad_signal = np.append(signal, z)
  indices = np.tile(np.arange(0, frame_length), (num_frames, 1)) + np.tile(np.arange(0, num_frames * frame_step, frame_step), (frame_length, 1)).T
  framed_signal = pad_signal[indices.astype(np.int32, copy=False)]
  return framed_signal