# Pitch Shifting

#### Import required libraries

In [None]:
import numpy as np
import scipy.io.wavfile as wav
import matplotlib.pyplot as plt
from scipy.signal import stft, istft

#### Pitch Shifting using TD-PSOLA (Time-Domain Pitch-Synchronous Overlap and Add)

##### **Step 1. Load and Preprocess Audio**
- `Loading Audio:` The audio file is loaded using `scipy.io.wavfile.read()`, and the audio is converted to mono if it's stereo.
- `Normalization:` The audio is normalized to ensure it lies within the range `[-1, 1]`.

##### **Step 2. Set Pitch Shifting Parameters and Frame Processing**
- `Pitch Factor:` The pitch factor is set to `x`, meaning the pitch will be raised by a factor of `x`. Set `pitch_factor` values to `2`, `4` and `8`.
- `Frame Size and Hop Size:` The audio is split into overlapping frames with a `frame_size` of `512` samples and `hop_size` of `128` to ensure overlap.
- `Frame Extraction:` We extract the frames from the audio using a list comprehension.

##### **Step 3. Pitch Shifting Using Frame Resampling**
- `Frame Resampling:` For each frame, the size is adjusted according to the pitch factor. A larger pitch factor results in a smaller frame size, raising the pitch.
- `Interpolation:` The frame is resampled using `np.interp()`, which performs linear interpolation to match the new frame size.

##### **Step 4. Post-Processing**
- `Clipping:` The resulting audio is clipped to ensure it stays within the valid range for audio playback `([-1, 1])`.
- `Saving:` The pitch-shifted audio is saved to a new .wav file after scaling it to the 16-bit integer range.

##### **Step 5. Visualizing Audio**
- `Spectrograms:` The original and pitch-shifted audio are visualized as spectrograms. This helps to confirm the pitch shift by showing the change in frequency content over time.

#### **Tips**:
- Set the proper file name!
- Set `pitch_factor`, `frame_size` and `hop_size` as defined in Step 2.

In [None]:
# Load the audio file
filename = 'audio_samples_07/FILE_NAME.wav'  # Replace with your .wav file
sampling_rate, audio = wav.read(filename)

# Ensure audio is mono
if audio.ndim > 1:
    audio = audio.mean(axis=1)

# Normalize audio
audio = audio / np.max(np.abs(audio))  # Normalize audio

# TD-PSOLA pitch shifting
pitch_factor = ?
frame_size = ?
hop_size = ?

# Divide audio into frames
frames = []
for i in range(0, len(audio) - frame_size, hop_size):
    frames.append(audio[i:i + frame_size])

# Apply time-synchronous overlap-add to modify pitch
new_audio = []
for i, frame in enumerate(frames):
    # Shift pitch by changing frame length based on pitch factor
    new_frame_size = int(frame_size / pitch_factor)  # Adjust frame size to preserve original duration
    new_frame = np.interp(np.linspace(0, frame_size, new_frame_size), np.arange(frame_size), frame)
    new_audio.extend(new_frame)

new_audio = np.array(new_audio)

# Ensure the audio is within range [-1, 1]
new_audio = np.clip(new_audio, -1, 1)

# Save the pitch-shifted audio
output_filename = 'td_psola_pitch_shifted.wav'
wav.write(output_filename, sampling_rate, (new_audio * 32767).astype(np.int16))

# Plot the original and pitch-shifted audio
plt.figure(figsize=(10, 6))

# Original spectrogram
plt.subplot(2, 1, 1)
plt.specgram(audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')
plt.title('Original Audio')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')

# Shifted spectrogram
plt.subplot(2, 1, 2)
plt.specgram(new_audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')
plt.title(f'Pitch-Shifted Audio (TD-PSOLA, Factor: {pitch_factor})')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')

plt.tight_layout()
plt.show()

#### Pitch Shifting Using Phase Vocode

##### **Step 1. Load and preprocess the audio**
Load the audio file using `scipy.io.wavfile.read()`, which returns the sampling rate and the audio data. If the audio is stereo (more than one channel), it is converted to mono by averaging the channels using `audio.mean(axis=1)`. The audio is then normalized by dividing by the maximum absolute value to ensure the audio signal stays within the range of `-1` to `1`.

##### **Step 2. Define pitch-shifting parameters**
Set up the parameters for the pitch shifting process. The `pitch_factor` is specified, where a value greater than 1 increases the pitch, and a value less than 1 decreases it. The `window_size` defines the size of the frames for the STFT, and the `hop_size` determines the overlap between consecutive frames.

##### **Step 3. Compute the Short-Time Fourier Transform (STFT)**
The STFT is applied to the audio using the `scipy.signal.stft()` function, which splits the audio signal into overlapping frames and transforms each frame into the frequency domain. The result, `Zxx`, contains complex numbers representing both the magnitude and phase of each frequency component at each time frame.

##### **Step 4. Pitch shifting via frequency bin interpolation**
The number of frequency bins is adjusted based on the pitch_factor. If the pitch is to be increased, the number of frequency bins will be greater than the original number, and if the pitch is to be decreased, the number of frequency bins will be reduced. This is done by interpolating the magnitude of the frequency bins across the new bin count using `np.interp()`. The phase information, obtained using `np.angle()`, is kept unchanged during the interpolation process.

##### **Step 5. Reconstruct the time-domain signal using inverse STFT**
After modifying the frequency bins, the inverse STFT is performed using `scipy.signal.istft()`, which reconstructs the audio signal from the shifted frequency bins back into the time domain.

##### **Step 6. Normalize and save the modified audio**
The pitch-shifted audio is then normalized to the range of -1 to 1 and saved to a new WAV file using `scipy.io.wavfile.write()`.

##### **Step 7. Visualize the results**
The original and pitch-shifted audio are visualized by plotting their spectrograms using `matplotlib.pyplot.specgram()`. 
#### **Tips**:
- Set the proper file name!
- Set `pitch_factor`, `frame_size` and `hop_size` as defined in Step 2 in the previous code cell.

In [None]:
# Load the audio file
filename = 'audio_samples_07/FILE_NAME.wav'  # Replace with your .wav file
sampling_rate, audio = wav.read(filename)

# Ensure audio is mono
if audio.ndim > 1:
    audio = audio.mean(axis=1)

# Normalize the audio
audio = audio / np.max(np.abs(audio))

# Set parameters for pitch shifting using Phase Vocoder
pitch_factor = ?  # Greater than 1 raises pitch, less than 1 lowers it
window_size = ?  # Size of the window for STFT
hop_size = ?

# Perform STFT (Short-Time Fourier Transform)
frequencies, times, Zxx = stft(audio, fs=sampling_rate, nperseg=window_size, noverlap=hop_size)

# Phase Vocoder Pitch Shifting
num_bins, num_frames = Zxx.shape
new_num_bins = int(num_bins * pitch_factor)

# Initialize a new array for shifted frequencies
shifted_Zxx = np.zeros((new_num_bins, num_frames), dtype=complex)

# Interpolate to shift pitch
for i in range(num_frames):
    # Interpolate magnitude
    magnitude = np.abs(Zxx[:, i])
    interpolated_magnitude = np.interp(
        np.linspace(0, num_bins, new_num_bins), np.arange(num_bins), magnitude
    )
    
    # Interpolate phase
    phase = np.angle(Zxx[:, i])
    interpolated_phase = np.interp(
        np.linspace(0, num_bins, new_num_bins), np.arange(num_bins), phase
    )

    # Reconstruct the shifted frequency bins
    shifted_Zxx[:, i] = interpolated_magnitude * np.exp(1j * interpolated_phase)

# Perform inverse STFT to reconstruct the audio signal
_, shifted_audio = istft(shifted_Zxx, fs=sampling_rate, nperseg=window_size, noverlap=hop_size)

# Normalize the audio to ensure it remains within range [-1, 1]
shifted_audio = shifted_audio / np.max(np.abs(shifted_audio))

# Save the pitch-shifted audio
output_filename = 'phase_vocoder_pitch_shifted.wav'
wav.write(output_filename, sampling_rate, (shifted_audio * 32767).astype(np.int16))

# Plot the original and pitch-shifted audio spectrograms
plt.figure(figsize=(10, 6))

# Plot the original audio spectrogram
plt.subplot(2, 1, 1)
plt.specgram(audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')
plt.title('Original Audio')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')

# Plot the pitch-shifted audio spectrogram
plt.subplot(2, 1, 2)
plt.specgram(shifted_audio, Fs=sampling_rate, NFFT=2048, noverlap=1024, cmap='plasma')
plt.title(f'Pitch-Shifted Audio (Phase Vocoder, Factor: {pitch_factor})')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')

plt.tight_layout()
plt.show()


#### Pitch Shifting Using Autocorrelation and Harmonic/Percussive Separation

##### **Step 1. Load the Audio File:**
The audio file is read into the `audio` array, and it's converted to mono if it's stereo.

##### **Step 2. Autocorrelation**
The autocorrelation is computed by using np.correlate() for the signal. We then extract the second half of the result, which corresponds to positive time lags. The peak index in this correlation is used to calculate the pitch period, and the pitch frequency is determined from this period.

##### **Step 3. Harmonic/Percussive Separation**
For simplicity, we simulate harmonic content by scaling the audio by a factor of `0.8` and assume the rest is percussive.

##### **Step 4. Pitch Shifting**
The harmonic content is stretched (or compressed) by resampling it using `np.interp()`, based on the pitch factor (`1.5` in this case).

##### **Step 5. Combining the Modified Harmonic and Percussive Content*
The harmonic part is resampled, and the percussive content is added back (adjusted to match the length of the resampled harmonic).

##### **Step 6. Clipping**
The resulting audio is clipped to ensure it remains in the valid range `[-1, 1]`.

##### **Step 7. Saving the Audio**
The processed audio is saved to a new `.wav` file.

##### **Step 8. Plotting**
The original and the pitch-shifted audio waveforms are plotted for comparison.


#### **Tips**:
- Set the proper file name!
- Set `harmonic` and `pitch_factor`as defined in Step 3 & 4 in the previous code cell.

In [None]:
# Step 1: Load the audio file
filename = 'audio_samples_07/FILE_NAME.wav'  # Replace with your .wav file
sampling_rate, audio = wav.read(filename)

# Ensure audio is mono
if audio.ndim > 1:
    audio = audio.mean(axis=1)

# Normalize audio
audio = audio / np.max(np.abs(audio))  # Normalize audio to range [-1, 1]

# Step 2: Apply Autocorrelation for pitch detection
# Calculate autocorrelation of the signal manually
result = np.correlate(audio, audio, mode='full')
corr = result[result.size // 2:]  # Take the second half of the result, which contains the positive lags

# Find the peak index, which corresponds to the pitch period
peak_index = np.argmax(corr)  # Index of the maximum correlation
pitch_period = peak_index  # This corresponds to the pitch period

# Ensure pitch_period is not zero to avoid divide by zero error
if pitch_period != 0:
    pitch_frequency = sampling_rate / pitch_period  # Frequency = sample rate / period
else:
    pitch_frequency = 0  # Handle case where pitch_period is zero

# Step 3: Harmonic/Percussive Separation (simplified approach)
# Simulate harmonic content (only part of the signal we modify)
harmonic = audio * ?  # Assume 80% of the signal is harmonic
percussive = audio - harmonic  # Percussive content is just the remainder

# Step 4: Modify harmonic content for pitch shifting
pitch_factor = ?  # Increase pitch by a factor of 1.5
num_samples = len(harmonic)
new_num_samples = int(num_samples / pitch_factor)

# Stretch the harmonic content by resampling (simple approach)
harmonic_resampled = np.interp(np.linspace(0, num_samples, new_num_samples), np.arange(num_samples), harmonic)

# Step 5: Combine modified harmonic and percussive content
output_audio = harmonic_resampled + percussive[:len(harmonic_resampled)]  # Combine with percussive content

# Step 6: Ensure the output is within the range [-1, 1]
output_audio = np.clip(output_audio, -1, 1)

# Step 7: Save the pitch-shifted audio
output_filename = 'autocorrelation_pitch_shifted_audio.wav'
wav.write(output_filename, sampling_rate, (output_audio * 32767).astype(np.int16))

# Step 8: Plot the original and pitch-shifted audio signals
plt.figure(figsize=(10, 6))

# Original audio
plt.subplot(2, 1, 1)
plt.plot(np.linspace(0, len(audio) / sampling_rate, len(audio)), audio)
plt.title('Original Audio')
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')

# Pitch-shifted audio
plt.subplot(2, 1, 2)
plt.plot(np.linspace(0, len(output_audio) / sampling_rate, len(output_audio)), output_audio)
plt.title(f'Pitch-Shifted Audio (Autocorrelation, Factor: {pitch_factor})')
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')

plt.tight_layout()
plt.show()
