# SpecTTTra: Extracting Mel-Spectrogram from Audio Input

## 1. Import Required Libraries

Import essential Python libraries such as `librosa` for audio processing, `matplotlib` for visualization, and `numpy` for numerical operations.

In [None]:
import librosa
import IPython.display as ipd
import numpy as np 
import matplotlib.pyplot as plt

## 2. Load and Play Audio File
Specify the path to the audio file and use IPython.display.Audio to play the sound directly inside the notebook.

In [None]:
audio_file = "audio/folded.wav"
ipd.Audio(audio_file)

## 3. Read Audio Data Using Librosa
Load the audio file as a time-series waveform and retrieve its sampling rate using `librosa.load()`.

In [None]:
# Load audio file with librosa
folded, sr = librosa.load(audio_file)

## 4. Compute Short-Time Fourier Transform (STFT)
Define frame and hop sizes, then use `librosa.stft()` to convert the time-domain signal into a time–frequency representation.

In [None]:
FRAME_SIZE = 2048
HOP_SIZE = 512
S_folded = librosa.stft(folded, n_fft=FRAME_SIZE, hop_length=HOP_SIZE)
S_folded.shape

## 5. Check the Data Type of STFT Output
Verify the type of the STFT array elements to understand the data structure (complex numbers representing frequency and phase information).

In [None]:
type(S_folded[0][0])

## 6. Convert STFT to Power Spectrogram
Compute the power (energy) of the spectrogram by taking the squared magnitude of the STFT values.

In [None]:
Y_folded = np.abs(S_folded) ** 2
Y_folded.shape

## 7. Inspect the Data Type of Power Spectrogram
Confirm the new data type of the power spectrogram after magnitude-squaring the STFT output.

In [None]:
type(Y_folded[0][0])

## 8. Define a function to pot spectrograms
Create a helper function `plot_spectrogram()` that uses `librosa.display.specshow()` to visualize the spectrogram with customizable axes and color scaling.

In [None]:
def plot_spectrogram(Y, sr, hop_length, y_axis="linear"):
    plt.figure(figsize=(25, 10))
    librosa.display.specshow(Y, 
                             sr=sr, 
                             hop_length=hop_length, 
                             x_axis="time", 
                             y_axis=y_axis)
    plt.colorbar(format="%+2.f")

## 9. Visualize the linear-frequency spectrogram
Plot the computed power spectrogram to visualize frequency energy distribution over time using a linear frequency scale.

In [None]:
plot_spectrogram(Y_folded, sr, HOP_SIZE)

## 10. Convert Power Spectrogram to Decibel Scale
Apply logarithmic scaling via `librosa.power_to_db()` to better represent human perception of loudness and then plot it.

In [None]:
Y_log_folded = librosa.power_to_db(Y_folded)
plot_spectrogram(Y_log_folded, sr, HOP_SIZE)

## 11. Display the Log-Scaled Spectrogram in Log Frequency Axis
Re-plot the decibel-scaled spectrogram using a logarithmic frequency axis to highlight both low- and high-frequency components more effectively.

In [None]:
plot_spectrogram(Y_log_folded, sr, HOP_SIZE, y_axis="log")