# Processing long files with streams

This notebook analyzes a collection of long recordings for quality-related audio features.  The features that it computes are basically the same as those in the previous notebook (04 - Measuring Audio Quality), but the features are summarized by their statistics.

The dataset that we analyze here consists of many long recordings (about 1 hour each), so loading an entire recording into memory and doing feature extraction would take more memory than we typically have on a laptop.  Instead, librosa allows us to work in *blocks* of audio at a time using the `librosa.stream()` function.  We compute the spectral roll-off and contrast features for each block, and then summarize all of the block-wise features to get a description of the entire recording.

It saves the results in a CSV file / pandas data frame that can be loaded back later for analysis.  This CSV file will power the visualization code used in the next notebook.

**Features**:
    - rolloff, 95% roll, and [5, 25, 50, 75, 95] percentiles over time
    - spectral contrast: averaged across 5 octave bands starting at 80Hz.  percentile aggregates over time

In [1]:
import librosa
import tqdm
import numpy as np
import pandas as pd
import os

In [33]:
def analyze_file(filename):
    
    # We need to know the sampling rate of the file in advance when streaming
    
    sr = librosa.get_samplerate(filename)
    
    # These are our analysis parameters, rescaled to match the sampling rate of the file in question
    frame_length = (2048 * sr) // 22050
    hop_length = (512 * sr) // 22050
    
    # Set up the stream for the file.  We'll look at blocks of 2048 frames at a time
    stream = librosa.stream(filename, 2048, frame_length, hop_length)
    
    # These lists will contain our extracted features
    rolloff = []
    contrasts = []
    
    # y here is one block's worth of audio, rather than the entire signal
    for y in stream:
        # Our analysis uses uncentered frames when streaming.  This avoids introducing artifacts at the block boundaries.
        S = np.abs(librosa.stft(y, n_fft=frame_length, hop_length=hop_length, center=False))
        
        # Compute the roll-off and append it to the `rolloff` list.  Same for contrast
        rolloff.extend(librosa.feature.spectral_rolloff(S=S, sr=sr, roll_percent=0.95)[0])
        contrasts.append(librosa.feature.spectral_contrast(S=S, sr=sr, fmin=80, n_bands=5)[1:])
    
    # Tidy up after ourselves: the stream is finished
    stream.close()
    
    # Now compute the statistics of the features, and put them in a pandas dataframe
    contrasts = np.concatenate(contrasts, axis=1)
    mean_contrast = np.mean(contrasts, axis=0)
    
    data = dict(filename=os.path.basename(filename))
    quantiles = [5, 25, 50, 75, 95]
    R = np.percentile(rolloff, quantiles)
    C = np.percentile(mean_contrast, quantiles)
    for i in range(len(quantiles)):
        data['rolloff_{:02d}'.format(quantiles[i])] = R[i]
        data['contrast_{:02d}'.format(quantiles[i])] = C[i]
    
    # Return the dataframe
    return data

In [34]:
# Get all the files that have been ogg-encoded
files = librosa.util.find_files('swdata/', ext='ogg')

In [35]:
len(files)

79

In [36]:
df = pd.DataFrame.from_records([analyze_file(_) for _ in tqdm.tqdm(files)], index='filename')


  0%|          | 0/79 [00:00<?, ?it/s][A
  1%|▏         | 1/79 [01:21<1:45:57, 81.50s/it][A
  3%|▎         | 2/79 [06:24<3:09:54, 147.98s/it][A
  4%|▍         | 3/79 [06:54<2:22:28, 112.48s/it][A
  5%|▌         | 4/79 [12:06<3:35:33, 172.44s/it][A
  6%|▋         | 5/79 [17:18<4:24:26, 214.41s/it][A
  8%|▊         | 6/79 [21:24<4:32:20, 223.85s/it][A
  9%|▉         | 7/79 [23:14<3:47:25, 189.52s/it][A
 10%|█         | 8/79 [25:04<3:16:05, 165.72s/it][A
 11%|█▏        | 9/79 [28:49<3:34:14, 183.64s/it][A
 13%|█▎        | 10/79 [32:50<3:50:45, 200.65s/it][A
 14%|█▍        | 11/79 [34:32<3:13:59, 171.16s/it][A
 15%|█▌        | 12/79 [35:20<2:29:47, 134.15s/it][A
 16%|█▋        | 13/79 [38:12<2:40:13, 145.66s/it][A
 18%|█▊        | 14/79 [41:18<2:50:44, 157.61s/it][A
 19%|█▉        | 15/79 [44:09<2:52:35, 161.81s/it][A
 20%|██        | 16/79 [46:26<2:41:54, 154.20s/it][A
 22%|██▏       | 17/79 [49:20<2:45:31, 160.19s/it][A
 23%|██▎       | 18/79 [53:10<3:04:12, 181.18s/it

In [37]:
df.to_csv('audio_quality.csv')