In [1]:
import sys
sys.path.append("../../")
from radarange_orchestrator import llm

In [2]:
from radarange_orchestrator.tools import all_tools
m = llm()

llama_init_from_model: n_ctx_per_seq (80128) < n_ctx_train (131072) -- the full capacity of the model will not be utilized


In [7]:
chat = m.chat(tools = all_tools)

In [8]:
prompt = """
Can you read the article https://arxiv.org/pdf/2502.01769 and give me a brief walkthrough of what happened?
"""

In [None]:
chat.add_user_message(prompt)
m.act(chat, max_tokens_per_message = 10000, temperature=0, on_message=chat.append, max_prediction_rounds=10)
chat.show_final_answer()

Called download_file with href: https://arxiv.org/pdf/2502.01769 and filename: 2502.01769.pdf
Taken 0.4 seconds to complete.
Called pdf_read with path: ./downloads/2502.01769.pdf
Taken 0.2 seconds to complete.


In [6]:
# chat.add_user_message("""
# Can you explain the formulas (9-11) from the article?
# """)
# m.act(chat, max_tokens_per_message = 10000, temperature=0, on_message=chat.append, max_prediction_rounds=10)
# chat.show_final_answer()

In [8]:
chat.add_user_message("""
Could you help me implement the phase vocoder from the article we began with in python?
Assume, data is already loaded to f: NDArray[np.float64], normalized to 1, and with given framerate.
I would like to make a pitch shift on X semitones in either directions
""")
m.act(chat, max_tokens_per_message = 10000, temperature=0, on_message=chat.append, max_prediction_rounds=10)
chat.show_final_answer()



Here's a Python implementation of a basic phase vocoder for pitch shifting, based on the principles from the article. This code assumes you have a normalized audio signal `f` and sample rate `sr`, and shifts the pitch by `X` semitones:

```python
import numpy as np

def pitch_shift(f, sr, semitones):
    # Compute time scaling factor (alpha) for pitch shift
    alpha = 2 ** (-semitones / 12)  # Pitch shift formula
    
    # Parameters
    window_size = 2048               # FFT window size (power of 2)
    hop_length = window_size // 4    # Analysis hop size (50% overlap)
    synthesis_hop = int(hop_length * alpha)  # Synthesis hop size
    
    # Window function (Hann window)
    window = np.hanning(window_size)
    
    # Initialize output buffer
    output_length = int(len(f) * alpha)  # Estimated output length
    output = np.zeros(output_length)
    output_pos = 0  # Current position in output array
    
    # Previous frame's phase and magnitude
    prev_phase = None
    prev_mag = None

    for i in range(0, len(f), hop_length):
        # Extract current frame
        frame = f[i:i+window_size]
        if len(frame) < window_size:
            break  # End of signal
        
        # Apply windowing
        windowed = frame * window
        
        # Compute STFT
        fft = np.fft.fft(windowed)
        mag = np.abs(fft)
        phase = np.angle(fft)
        
        if prev_phase is None:
            # First frame: no phase modification
            prev_phase = phase
            prev_mag = mag
            # Add to output (no scaling yet)
            first_frame = np.fft.ifft(fft).real
            output[0:window_size] += first_frame
            output_pos = synthesis_hop  # Next frame starts here
        else:
            # Compute phase difference and unwrap
            phase_diff = phase - prev_phase
            phase_diff_unwrapped = np.unwrap(phase_diff)
            
            # Compute new phase with scaling
            new_phase = prev_phase + alpha * phase_diff_unwrapped
            
            # Reconstruct modified frame
            modified_fft = mag * np.exp(1j * new_phase)
            modified_frame = np.fft.ifft(modified_fft).real
            
            # Overlap-add to output
            start = int(output_pos)
            end = start + window_size
            output[start:end] += modified_frame
            
            # Update position for next frame
            output_pos += synthesis_hop
            
            # Update previous values for next iteration
            prev_phase = new_phase
            prev_mag = mag
        
        # Move to next analysis frame
        prev_frame_start = i

    # Truncate to actual output length
    return output[:int(output_pos)]
```

### Key Components:
1. **Scaling Factor**: 
   - `alpha = 2^(-semitones/12)` adjusts time scaling. Pitch shift by X semitones corresponds to frequency scaling of `2^(X/12)`, so time scaling is the inverse.

2. **Windowing**:
   - Uses a Hann window to reduce spectral leakage.

3. **Phase Correction**:
   - Computes phase differences between consecutive frames.
   - Unwraps phase to avoid discontinuities.
   - Scales phase increments by `alpha` to achieve pitch shift.

4. **Overlap-Add Synthesis**:
   - Reconstructs the signal using inverse FFT and overlap-add to maintain continuity.

### Usage:
```python
# Example usage:
import numpy as np
from scipy.io import wavfile

# Load your audio data (replace with your own)
# sr, f = wavfile.read('input.wav')
# f = f.astype(np.float64)  # Ensure float64
# f /= np.max(np.abs(f))    # Normalize to 1

# Pitch shift by +2 semitones (higher pitch)
shifted_audio = pitch_shift(f, sr, semitones=2)

# Save output (adjust as needed)
# wavfile.write('output.wav', sr, shifted_audio)
```

### Notes:
- **Window Size**: Adjust `window_size` (e.g., 1024, 4096) for different time-frequency resolutions.
- **Hop Length**: Controls overlap. Smaller values (e.g., `window_size//4`) improve quality but increase computation.
- **Edge Cases**: The code may need padding for signals not perfectly aligned with window/hop sizes.
- **Phase Unwrapping**: Basic unwrapping is used; for long signals, consider tracking phase over all frames for each bin.

This implementation provides a foundational approach. For production use, consider:
- Handling edge frames with zero-padding.
- Using more robust phase unwrapping (e.g., tracking phase over all frames for each bin).
- Optimizing with FFT libraries like `librosa` for efficiency.

In [9]:
m.close()

In [10]:
chat.display_thoughts()