# Pre-Emphasis

#### Import required libraries

In [None]:
import numpy as np
import scipy.io.wavfile as wav
import matplotlib.pyplot as plt
from scipy.signal import lfilter
from scipy.signal.windows import hamming
from scipy.linalg import solve_toeplitz

#### Formant Detection Using LPC

##### **Step 1. Load and Prepare the Audio Signal**
The audio file is loaded using the `wavfile.read()` function from the scipy.io module, capturing both the sample rate and the audio signal data. To ensure the audio is suitable for processing, the code checks if the audio is stereo (more than one channel). If it is, the two channels are averaged to create a mono audio signal. After that, the audio signal is normalized to fit within the range of `[-1, 1]` by dividing it by its maximum absolute value, ensuring consistent amplitude levels across different signals.

##### **Step 2. Compute the Spectrogram**
A spectrogram is generated to visualize the frequency content of the audio signal over time. It represents how the signal's frequency content varies over time, providing insights into the characteristics of the speech signal.

##### **Step 3. Formant Detection Using LPC**
The code performs LPC analysis to detect formants in the audio signal. The audio is processed in frames, where each frame has a length of 25 milliseconds (`frame_length`) and advances by 10 milliseconds (`frame_step`). A Hamming window is applied to each frame to minimize spectral leakage during analysis.

The autocorrelation of the frame is computed, and the LPC coefficients are derived by solving a Toeplitz system. The roots of the LPC polynomial are calculated, focusing only on roots with positive imaginary parts. These roots are then converted to frequencies, representing the formants. The detected formants' frequencies and their corresponding time indices are stored in `formants_time` and `formants_freqs` lists for later visualization.

##### **Step 4. Overlay Formants on the Spectrogram**
The detected formants are overlaid on the previously computed spectrogram for visual comparison. Each formant's frequency is plotted against its corresponding time point, allowing for an analysis of how well the formants align with the spectral features in the audio signal.

#### **Tips**:
- Test with both speech signals!

In [None]:
# Step 1: Load the speech sample
path_folder = "audio_samples_07/"
file_name = 'FILE_NAME'   # Replace with your file path
file_ext = '.wav'

filename = path_folder + file_name + file_ext
sampling_rate, audio = wav.read(filename)

# Step 2: Check if audio is stereo and convert to mono if necessary
if audio.ndim > 1:  # Check if audio has more than one channel
    audio = audio.mean(axis=1)  # Average the two channels to create mono audio

# Step 3: Normalize the audio
audio = audio / np.max(np.abs(audio))

# Step 4: Compute the spectrogram
plt.figure(figsize=(10, 6))
plt.specgram(audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')
plt.title('Spectrogram of Speech Sample')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')

# Step 5: Formant detection using LPC
frame_length = int(0.025 * sampling_rate)  # 25ms frame
frame_step = int(0.010 * sampling_rate)    # 10ms step

num_formants = 3
formants_time = []
formants_freqs = []

for i in range(0, len(audio) - frame_length, frame_step):
    frame = audio[i:i + frame_length]
    frame = frame * hamming(frame_length)  # Apply Hamming window
    
    # LPC analysis
    order = 2 + sampling_rate // 1000  # Typical LPC order
    autocorr = np.correlate(frame, frame, mode='full')
    autocorr = autocorr[autocorr.size // 2:]
    
    R = autocorr[:order + 1]
    r = R[1:]
    a = solve_toeplitz((R[:order], R[:order]), r)
    
    # Compute the roots of the LPC polynomial
    roots = np.roots(np.concatenate(([1], -a)))
    roots = roots[np.imag(roots) >= 0]  # Consider only roots with positive imaginary parts
    
    angles = np.angle(roots)
    formant_freqs = sorted(angles * (sampling_rate / (2 * np.pi)))
    formant_freqs = formant_freqs[:num_formants]  # Take the first 'num_formants' frequencies
    
    formants_time.append(i / sampling_rate)
    formants_freqs.append(formant_freqs)

# Step 6: Overlay formants on the spectrogram
formants_freqs = np.array(formants_freqs)
for i in range(num_formants):
    plt.plot(formants_time, formants_freqs[:, i], label=f'Formant {i+1}', linewidth=2)

plt.legend()
plt.show()

#### Formant Shifting

##### **Step 1. Load Audio Signal**
The audio file is loaded using `wavfile.read()`, capturing the sample rate and the signal data. To ensure uniform amplitude levels across signals, the audio data is normalized to fit within a range of `[−1, 1]`. This is done by dividing the signal values by the maximum absolute value in the signal.

##### **Step 2. Check Audio Format and Normalize**
To handle audio files properly, the code checks if the audio signal is stereo. If the audio has two channels, it averages them to create a mono audio signal. Following this, the audio is normalized to fit the range of `[-1, 1]`, ensuring that all audio signals can be processed without distortion.

##### **Step 3. Set Parameters for LPC Analysis**
Parameters for the Linear Predictive Coding (LPC) analysis are defined, including the frame length and step size. In this case, the frame length is set to `25 ms`, and the frame step is set to `10 ms`. Additionally, the LPC order is calculated based on the sample rate, which is a typical value for LPC analysis.

##### **Step 4. Apply LPC to Analyze the Audio**
The code processes the audio in overlapping frames. For each frame, the Hamming window is applied to smooth the edges of the frame before analysis. The autocorrelation of the frame is computed to derive the LPC coefficients, which represent the vocal tract's resonances.

##### **Step 5. Formant Frequency Calculation and Spectrogram Display**
The roots of the LPC polynomial are computed to identify formant frequencies. These frequencies are extracted, and their time points are recorded. The spectrogram of the audio is plotted using `plt.specgram()`, which provides a visual representation of the audio's frequency content over time. The identified formant frequencies are then overlaid on the spectrogram for visual comparison.

##### **Step 6. Shift Formants Using LPC Coefficients**
A new section of the code modifies the LPC coefficients to shift the formant frequencies. The `shift_ratio` determines the amount of shift applied to the formants. For each frame, the new LPC coefficients are calculated by multiplying the original coefficients by the shift ratio raised to the power of their respective indices. This process approximates the desired shift in formant frequencies.

##### **Step 7. Reconstruct the Audio Signal**
Using the modified LPC coefficients, the audio signal is reconstructed for each frame. The filtered signal is created by applying the `lfilter` function, which processes the audio based on the new LPC coefficients. The reconstructed frames are combined to form the complete shifted audio signal.

##### **Step 8. Save the Reconstructed Audio to a New WAV File**

#### **Tips**:
- Test with both speech signals!

In [None]:
# Step 1: Load the speech sample
path_folder = "audio_samples_07/"
file_name = 'FILE_NAME'   # Replace with your file path
file_ext = '.wav'

filename = path_folder + file_name + file_ext
sampling_rate, audio = wav.read(filename)

# Step 2: Check if audio is stereo and convert to mono if necessary and normalize it
if audio.ndim > 1:
    audio = audio.mean(axis=1)

audio = audio / np.max(np.abs(audio))

# Step 3: Formant detection using LPC
frame_length = int(0.025 * sampling_rate)
frame_step = int(0.010 * sampling_rate)

num_formants = 3
formants_time = []
formants_freqs = []

# Initialize a reconstructed signal
reconstructed_audio = np.zeros(len(audio))

for i in range(0, len(audio) - frame_length, frame_step):
    frame = audio[i:i + frame_length]
    frame = frame * hamming(frame_length)  # Apply Hamming window
    
    order = 2 + sampling_rate // 1000
    autocorr = np.correlate(frame, frame, mode='full')
    autocorr = autocorr[autocorr.size // 2:]
    
    R = autocorr[:order + 1]
    r = R[1:]
    a = solve_toeplitz((R[:order], R[:order]), r)
    
    roots = np.roots(np.concatenate(([1], -a)))
    roots = roots[np.imag(roots) >= 0]
    
    angles = np.angle(roots)
    formant_freqs = sorted(angles * (sampling_rate / (2 * np.pi)))
    formant_freqs = formant_freqs[:num_formants]
    
    formants_time.append(i / sampling_rate)
    formants_freqs.append(formant_freqs)

# Step 4: Shift the formants
shift_ratio = 1.1  # Shift formants upward by 10%
shifted_freqs = np.array(formants_freqs) * shift_ratio

# Step 5: Adjust LPC coefficients based on shifted formants
# Create a new LPC coefficient array with the same order
shifted_a = np.zeros(order + 1)
shifted_a[0] = 1.0  # The first coefficient is always 1.0 for stability

# Calculate new LPC coefficients based on shifting
for j in range(1, min(order + 1, len(a))):  # Limit to the size of `a`
    shifted_a[j] = a[j] * (shift_ratio ** j)  # Simple approximation for demonstration

# Step 6: Reconstruct the audio signal using the modified LPC coefficients
for i in range(0, len(audio) - frame_length, frame_step):
    frame = audio[i:i + frame_length]
    frame = frame * hamming(frame_length)  # Apply Hamming window

    # Reconstruct the audio signal using the modified LPC coefficients
    reconstructed_frame = lfilter(shifted_a, [1], frame)
    reconstructed_audio[i:i + len(reconstructed_frame)] += reconstructed_frame

# Step 7: Compute and plot spectrograms
plt.figure(figsize=(12, 8))

# Original Spectrogram
plt.subplot(2, 1, 1)
plt.specgram(audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')
plt.title('Original Spectrogram')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')

# Shifted Spectrogram
plt.subplot(2, 1, 2)
plt.specgram(reconstructed_audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')
plt.title('Shifted Spectrogram')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.tight_layout()
plt.show()

# Step 8: Save the reconstructed audio to a new WAV file
output_file_name = f'{file_name}_formant_shifted.wav'   # Specify your output file path
reconstructed_audio = reconstructed_audio / np.max(np.abs(reconstructed_audio))  # Normalize if needed
wav.write(output_file_name, sampling_rate, (reconstructed_audio * 32767).astype(np.int16))  # Save as 16-bit PCM

#### Formant Boosting

##### **Step 1. Load Audio Signal**
The audio file containing the speech sample is loaded using `wav.read()`, retrieving both the sample rate and the audio signal data. The filename is specified, pointing to the .wav file to be analyzed.

##### **Step 2. Check if Audio is Stereo and Convert to Mono**
Check if the audio signal is stereo (more than one channel). If it is, the two channels are averaged to create a mono audio signal, which simplifies further analysis.

##### **Step 3. Normalize the Audio**
The audio signal is normalized by dividing it by its maximum absolute value. This step ensures that the audio values are scaled to a range of `[-1, 1]`, preventing clipping during processing.

##### **Step 4. Formant Detection Using LPC**
Parameters are set for analyzing the audio signal in frames. A frame length of `25 ms` and a step size of `10 ms` are defined. Arrays are initialized to store the time and frequency data of the detected formants. A loop iterates through the audio signal, applying a Hamming window to each frame and performing Linear Predictive Coding (LPC) analysis to calculate the LPC coefficients. The roots of the LPC polynomial are computed to determine the formant frequencies, which are then stored for later visualization.

##### **Step 5. Overlay Formants on the Spectrogram (Original)**
A spectrogram of the original audio signal is generated using `plt.specgram()`, displaying frequency content over time. The detected formants are plotted on top of the spectrogram to visualize their positions in the frequency domain.

##### **Step 6. Boost Formants**
The LPC coefficients are boosted by a specified factor to enhance the formant amplitudes. Each frame is processed similarly to the original audio, and the boosted coefficients are used to reconstruct the audio signal. The resulting audio signal is accumulated into a single reconstructed audio array.

##### **Step 7. Save the Boosted Audio to a New WAV File**
The reconstructed audio is normalized again to ensure it fits within the appropriate range before saving it to a new WAV file.

##### **Step 8. Plot the Spectrogram of the Boosted Audio**
A new spectrogram is generated for the boosted audio.

#### **Tips**:
- Test with both speech signals!

In [None]:
# Step 1: Load the speech sample
path_folder = "audio_samples_07/"
file_name = 'FILE_NAME'   # Replace with your file path
file_ext = '.wav'

filename = path_folder + file_name + file_ext
sampling_rate, audio = wav.read(filename)

# Step 2: Check if audio is stereo and convert to mono if necessary
if audio.ndim > 1:
    audio = audio.mean(axis=1)  # Average the two channels to create mono audio

# Step 3: Normalize the audio
audio = audio / np.max(np.abs(audio))

# Step 4: Formant detection using LPC
frame_length = int(0.025 * sampling_rate)  # 25ms frame
frame_step = int(0.010 * sampling_rate)    # 10ms step

num_formants = 3
formants_time = []
formants_freqs = []

# Initialize a reconstructed signal
reconstructed_audio = np.zeros(len(audio))

# Formant detection loop
for i in range(0, len(audio) - frame_length, frame_step):
    frame = audio[i:i + frame_length]
    frame = frame * hamming(frame_length)  # Apply Hamming window
    
    order = 2 + sampling_rate // 1000  # Typical LPC order
    autocorr = np.correlate(frame, frame, mode='full')
    autocorr = autocorr[autocorr.size // 2:]
    
    R = autocorr[:order + 1]
    r = R[1:]
    a = solve_toeplitz((R[:order], R[:order]), r)
    
    # Formant frequency calculation
    roots = np.roots(np.concatenate(([1], -a)))
    roots = roots[np.imag(roots) >= 0]  # Consider only roots with positive imaginary parts
    
    angles = np.angle(roots)
    formant_freqs = sorted(angles * (sampling_rate / (2 * np.pi)))
    formant_freqs = formant_freqs[:num_formants]  # Take the first 'num_formants' frequencies
    
    formants_time.append(i / sampling_rate)
    formants_freqs.append(formant_freqs)

# Step 5: Overlay formants on the spectrogram
plt.figure(figsize=(10, 6))
plt.specgram(audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')
formants_freqs = np.array(formants_freqs)
for i in range(num_formants):
    plt.plot(formants_time, formants_freqs[:, i], label=f'Formant {i+1}', linewidth=2)

plt.title('Spectrogram of Speech Sample with Formants (Original)')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.legend()
plt.show()

# Step 6: Boost Formants
boost_factor = 2.0  # Example boost factor to increase formant amplitudes
reconstructed_audio = np.zeros(len(audio))

for i in range(0, len(audio) - frame_length, frame_step):
    frame = audio[i:i + frame_length]
    boosted_frame = frame * hamming(frame_length)
    
    # Apply LPC analysis
    order = 2 + sampling_rate // 1000  # Typical LPC order
    autocorr = np.correlate(boosted_frame, boosted_frame, mode='full')
    autocorr = autocorr[autocorr.size // 2:]

    R = autocorr[:order + 1]
    r = R[1:]
    a = solve_toeplitz((R[:order], R[:order]), r)
    
    # Ensure the size of boosted_a matches the order of LPC coefficients
    boosted_a = np.zeros(len(a))  # Initialize boosted_a based on the length of a
    boosted_a[0] = 1.0
    for j in range(1, len(a)):  # Use len(a) to ensure we do not go out of bounds
        boosted_a[j] = a[j] * boost_factor  # Boost the coefficients
    
    # Reconstruct the audio signal using the modified LPC coefficients
    reconstructed_frame = lfilter(boosted_a, [1], boosted_frame)
    reconstructed_audio[i:i + frame_length] += reconstructed_frame

# Step 7: Save the boosted audio to a new WAV file
output_filename = 'audio_samples_07/speech_formant_boosted.wav'  # Specify your output file path
reconstructed_audio = reconstructed_audio / np.max(np.abs(reconstructed_audio))  # Normalize if needed
wav.write(output_filename, sampling_rate, (reconstructed_audio * 32767).astype(np.int16))  # Save as 16-bit PCM

# Step 8: Plot the spectrogram of the boosted audio
plt.figure(figsize=(10, 6))
plt.specgram(reconstructed_audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')

# Overlay the formants on the boosted spectrogram
formants_time_boosted = []
formants_freqs_boosted = []

for i in range(0, len(reconstructed_audio) - frame_length, frame_step):
    frame = reconstructed_audio[i:i + frame_length]
    frame = frame * hamming(frame_length)
    
    order = 2 + sampling_rate // 1000
    autocorr = np.correlate(frame, frame, mode='full')
    autocorr = autocorr[autocorr.size // 2:]

    R = autocorr[:order + 1]
    r = R[1:]
    a = solve_toeplitz((R[:order], R[:order]), r)
    
    # Formant frequency calculation
    roots = np.roots(np.concatenate(([1], -a)))
    roots = roots[np.imag(roots) >= 0]
    
    angles = np.angle(roots)
    formant_freqs = sorted(angles * (sampling_rate / (2 * np.pi)))
    formant_freqs = formant_freqs[:num_formants]
    
    formants_time_boosted.append(i / sampling_rate)
    formants_freqs_boosted.append(formant_freqs)

# Overlay formants on the boosted spectrogram
formants_freqs_boosted = np.array(formants_freqs_boosted)
for i in range(num_formants):
    plt.plot(formants_time_boosted, formants_freqs_boosted[:, i], label=f'Boosted Formant {i+1}', linewidth=2)

plt.title('Spectrogram of Boosted Speech Sample with Formants')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.legend()
plt.show()