<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Abstract" data-toc-modified-id="Abstract-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Abstract</a></span></li><li><span><a href="#Documentation" data-toc-modified-id="Documentation-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Documentation</a></span><ul class="toc-item"><li><span><a href="#Algorithm" data-toc-modified-id="Algorithm-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Algorithm</a></span></li><li><span><a href="#Implementation" data-toc-modified-id="Implementation-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Implementation</a></span></li></ul></li><li><span><a href="#Examples" data-toc-modified-id="Examples-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Examples</a></span></li><li><span><a href="#Appendix:-list-of-analysis-frequencies-and-pitches" data-toc-modified-id="Appendix:-list-of-analysis-frequencies-and-pitches-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Appendix: list of analysis frequencies and pitches</a></span></li></ul></div>

# Abstract

The NessStretch is a refinement of [Paul Nasca](http://www.paulnasca.com/)'s excellent [PaulStretch](http://hypermammut.sourceforge.net/paulstretch/) algorithm, with two major changes:

1. Whereas PaulStretch uses a single FFT frame size throughout the entire frequency range, the NessStretch supports a multiband FFT stretch.  This greatly improves the depth and intelligibility of the stretch in the higher register.
2. PaulStretch uses a conventional analysis-synthesis window pair.  (The window choice itself, though unconventional, is essentially the square root of a COLA-compliant window, [following custom](https://ccrma.stanford.edu/~jos/sasp/COLA_Examples.html).).  The NessStretch supports variable crossfades between windows based on frame correlation, as specified [here].(https://www.researchgate.net/publication/307994109_SIGNAL-MATCHED_POWER-COMPLEMENTARY_CROSS-FADING_AND_DRY-WET_MIXING)

# Documentation

TODO: I'll update this after Sam and I work more on the 2021 ICMC paper.

## Implementation

The NessStretch implementation is *somewhat* similar to the PaulStretch stereo Python implementation:

* Both implementations generate timestretch frames by stepping through the input sample array more slowly (by the timestretch factor) than the output sample array.  This creates de facto spectral interpolation.
* Both scramble the frequency bin phases in some way.
* Both use an [RFFT](https://numpy.org/doc/stable/reference/generated/numpy.fft.rfft.html) to optimize analysis and synthesis for real-valued input and output.

There are, however, a couple implementation differences that are not purely cosmetic:

* PaulStretch writes synthesis frames to a buffer, and the buffer content is appended to an output audio file.  This is efficient (there's no need for large intermediate files), but the buffer math is a bit of a headache, and there's no simple way to mix different output layers together.  Instead, NessStretch loads a large mix_bus array for each channel, to which it adds the output from each time-stretched frequency band.  Unless PaulStretch, this generates some large intermediate files (roughly 10 MB per channel per minute), but the process is more transparent, and mixing the frequency bands together is trivial.
*  PaulStretch doesn't normalize the output audio (which makes sense, because there's no simple way to normalize an audio file "in real time"; you would have to use some sort of dynamics processing).  NessStretch normalizes the maximum output to the maximum input (all  the audio data is stored in arrays ahead of time, so this is easy to do).

Some miscellaneous script details that may not be obvious:

* RFFT bins: an RFFT returns nfft // 2 + 1 bins total.  Bin 0 is the DC component (0 Hz), and bin nfft // 2 is the Nyquist component (sampling rate / 2 Hz).
* Input file padding: pad the input audio with nfft // 2 samples on either end to center the analysis windows correctly.  (It's easy to check this by time-stretching an impulse signal: the output should sound like a symmetrical filter sweep.)

# Examples

* [You stay on my mind before I wake up](https://alexness.bandcamp.com/album/you-stay-on-my-mind-before-i-wake-up)


# Appendix: list of analysis frequencies and pitches

In [66]:
import numpy as np

fancy_bands = {
    256: (65, 129),
    512: (65, 129),
    1024: (65, 129),
    2048: (65, 129),
    4096: (65, 129),
    8192: (65, 129),
    16384: (65, 129),
    32768: (65, 129),
    65536: (0, 129)
    }
input_sample_rate = 48000
freqs = []
for window_size in reversed(sorted(fancy_bands.keys())):
    low_bin, high_bin = fancy_bands[window_size]
    band_freqs = np.fft.rfftfreq(window_size, 1/input_sample_rate)[low_bin:high_bin]
    freqs.extend(band_freqs)
    
# adapted from
# https://www.johndcook.com/blog/2016/02/10/musical-pitch-notation/
A4 = 440
C0 = A4*pow(2, -4.75)  
names = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]

class Pitch(object):
    def __init__(self, freq):
        self.freq = freq
        self.step, self.microtone = divmod(12 * np.log2(freq/C0), 1)
        self.cents_deviation = 100 * self.microtone
        self.octave, n = [int(i) for i in divmod(self.step, 12)]
        self.name = names[int(n)]
    def __str__(self):
        return f'{self.freq:10.2f} {self.name:2} {self.cents_deviation:5.2f} {self.octave:3}'

for i, f in enumerate(freqs[1:]):
    p = Pitch(f)
    print(f'{i+1:4}{p}')

   1      0.73 F# 23.26  -5
   2      1.46 F# 23.26  -4
   3      2.20 C# 25.22  -3
   4      2.93 F# 23.26  -3
   5      3.66 A#  9.58  -3
   6      4.39 C# 25.22  -2
   7      5.13 D# 92.09  -2
   8      5.86 F# 23.26  -2
   9      6.59 G# 27.17  -2
  10      7.32 A#  9.58  -2
  11      8.06 B  74.58  -2
  12      8.79 C# 25.22  -1
  13      9.52 D  63.79  -1
  14     10.25 D# 92.09  -1
  15     10.99 F  11.53  -1
  16     11.72 F# 23.26  -1
  17     12.45 G  28.22  -1
  18     13.18 G# 27.17  -1
  19     13.92 A  20.78  -1
  20     14.65 A#  9.58  -1
  21     15.38 A# 94.05  -1
  22     16.11 B  74.58  -1
  23     16.85 C  51.54   0
  24     17.58 C# 25.22   0
  25     18.31 C# 95.89   0
  26     19.04 D  63.79   0
  27     19.78 D# 29.13   0
  28     20.51 D# 92.09   0
  29     21.24 E  52.84   0
  30     21.97 F  11.53   0
  31     22.71 F  68.30   0
  32     23.44 F# 23.26   0
  33     24.17 F# 76.54   0
  34     24.90 G  28.22   0
  35     25.63 G  78.40   0
  36     26.37 G# 27

Non-uniform bins:

Spliced octave:

fancy_bands = {
    256: (65, 129),    # top octave:  64 bins
    512: (65, 129),    # next octave: 64 bins
                       # spliced octave: 96 bins
    1024: (97, 129),   # next fourth: 32 bins
    2048: (129, 193),  # next fifth:  64 bins
    4096: (129, 257),  # next octave: 128 bins
    8192: (129, 257),  # next octave
    16384: (129, 257), # etc.
    32768: (129, 257),
    65536: (129, 257),
    131072: (0, 257),
    }


Better time resolution in the top octaves:

fancy_bands = {
    128: (33, 65),     # top octave: 32 bins
    256: (33, 65),     # next octave: 32 bins
    1024: (65, 129),   # next octave: 64 bins
    4096: (129, 257),  # next octave: 128 bins
    8192: (129, 257),
    16384: (129, 257),
    32768: (129, 257),
    65536: (129, 257),
    131072: (0, 257),
    }


As above, with spliced octaves:

fancy_bands = {
    128: (33, 65),     # top octave: 32 bins
    256: (33, 65),     # next octave: 32 bins
                       # spliced octave: 48 bins
    512: (49, 65),     # next fourth: 16 bins
    1024: (65, 97),    # next fifth: 32 bins
                       # spliced octave: 96 bins
    2048: (97, 129),   # next fourth: 32 bins
    4096: (129, 193),  # next fifth: 64 bins
    8192: (129, 257),  # next octave: 128 bins
    16384: (129, 257), # etc.
    32768: (129, 257),
    65536: (129, 257),
    131072: (0, 257),
    }


In [1]:
import numpy as np

fancy_bands = {
    128: (33, 65),     # top octave: 32 bins
    256: (33, 65),     # next octave: 32 bins
                       # spliced octave: 48 bins
    512: (49, 65),     # next fourth: 16 bins
    1024: (65, 97),    # next fifth: 32 bins
                       # spliced octave: 96 bins
    2048: (97, 129),   # next fourth: 32 bins
    4096: (129, 193),  # next fifth: 64 bins
    8192: (129, 257),  # next octave: 128 bins
    16384: (129, 257), # etc.
    32768: (129, 257),
    65536: (129, 257),
    131072: (0, 257),
    }

input_sample_rate = 48000
freqs = []
for window_size in reversed(sorted(fancy_bands.keys())):
    low_bin, high_bin = fancy_bands[window_size]
    band_freqs = np.fft.rfftfreq(window_size, 1/input_sample_rate)[low_bin:high_bin]
    freqs.extend(band_freqs)
    
# adapted from
# https://www.johndcook.com/blog/2016/02/10/musical-pitch-notation/
A4 = 440
C0 = A4*pow(2, -4.75)  
names = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]

class Pitch(object):
    def __init__(self, freq):
        self.freq = freq
        self.step, self.microtone = divmod(12 * np.log2(freq/C0), 1)
        self.cents_deviation = 100 * self.microtone
        self.octave, n = [int(i) for i in divmod(self.step, 12)]
        self.name = names[int(n)]
    def __str__(self):
        return f'{self.freq:10.2f} {self.name:2} {self.cents_deviation:5.2f} {self.octave:3}'

thisFreq = 0
for i, f in enumerate(freqs[1:]):
    assert thisFreq < f
    thisFreq = f
    p = Pitch(f)
    print(f'{i+1:4}{p}')

   1      0.37 F# 23.26  -6
   2      0.73 F# 23.26  -5
   3      1.10 C# 25.22  -4
   4      1.46 F# 23.26  -4
   5      1.83 A#  9.58  -4
   6      2.20 C# 25.22  -3
   7      2.56 D# 92.09  -3
   8      2.93 F# 23.26  -3
   9      3.30 G# 27.17  -3
  10      3.66 A#  9.58  -3
  11      4.03 B  74.58  -3
  12      4.39 C# 25.22  -2
  13      4.76 D  63.79  -2
  14      5.13 D# 92.09  -2
  15      5.49 F  11.53  -2
  16      5.86 F# 23.26  -2
  17      6.23 G  28.22  -2
  18      6.59 G# 27.17  -2
  19      6.96 A  20.78  -2
  20      7.32 A#  9.58  -2
  21      7.69 A# 94.05  -2
  22      8.06 B  74.58  -2
  23      8.42 C  51.54  -1
  24      8.79 C# 25.22  -1
  25      9.16 C# 95.89  -1
  26      9.52 D  63.79  -1
  27      9.89 D# 29.13  -1
  28     10.25 D# 92.09  -1
  29     10.62 E  52.84  -1
  30     10.99 F  11.53  -1
  31     11.35 F  68.30  -1
  32     11.72 F# 23.26  -1
  33     12.08 F# 76.54  -1
  34     12.45 G  28.22  -1
  35     12.82 G  78.40  -1
  36     13.18 G# 27