  ![alt text](https://www.mbari.org/wp-content/uploads/2014/11/logo-mbari-3b.png "MBARI")

  <div align="left">Copyright (c) 2021, MBARI</div>

  * Distributed under the terms of the GPL License
  * Maintainer: dcline@mbari.org
  * Authors: Danelle Cline dcline@mbari.org, John Ryan ryjo@mbari.org

# Blue A Stream Detection

This notebook demonstrates how to predict blue A calls on a spectrogram incrementally, leveraging PCEN streaming.  This is useful when an audio file is large,  or if streaming data directly from a hydrophone. This is an alternative to using BLEDs for detection. 


## Install dependencies

First, let's install dependencies and include all packages used in this tutorial. This only needs to be done once for the duration of this notebook.

In [None]:
!pip install oceansoundscape==1.1.0 --quiet

In [None]:
%matplotlib inline
import boto3
from botocore import UNSIGNED
from botocore.client import Config
import numpy as np
from matplotlib import gridspec
import matplotlib.pyplot as plt
import base64
import soundfile as sf
import scipy
import sklearn
import shutil
import librosa as librosa
import librosa.display as display 
from librosa.display import specshow, waveplot
import tensorflow as tf
import cv2
import requests
import timeit
import json
from numpy.lib import stride_tricks
import io
import pandas as pd
from pathlib import Path
from oceansoundscape.spectrogram import conf, colormap
from oceansoundscape.spectrogram.utils import ImageUtils as utils

# First, download an audio file to test

In [None]:
# First, let's download the data used in this notebook
bucket = 'emso-tsc2021-session3-eu-west-3'
# wav_filename = 'blue_A_stream.wav' # shorter 5 minute examples
wav_filename = 'MARS-20171101T000000Z-10min-2kHz.wav' 
#'MARS-20171101T000000Z-2kHz.wav' # full-day example

s3 = boto3.resource('s3',
    aws_access_key_id='',
    aws_secret_access_key='',
    config=Config(signature_version=UNSIGNED))

# only download if needed
if not Path(wav_filename).exists():
    print('Downloading')
    s3.Bucket(bucket).download_file(wav_filename, wav_filename)
    print(f'Done downloading {wav_filename}')

samples, sample_rate = sf.read(wav_filename,dtype='float32')
nsec = (samples.size)/sample_rate # number of seconds in vector
print(f'Read {nsec} seconds of data')

# PCEN Streaming

Here, can use the streaming IO with librosa.pcen to do dynamic per-channel energy normalization (PCEN) on a spectrogram incrementally. 

First, set up the block reader to work on audio segments at least the length of an expected call, overlapping prediction by 75% with adjacent frames.

In [None]:
# The optimum configuration for the call spectrogram generation
# is defined in the oceanscoundscape package based on extensive
# hyper parameter sweeps. These need to match those used to train the model
blue_a_conf = conf.CONF_DICT['blueA']

# this is a global for all call types
fft_overlap = conf.OVERLAP 

# fft window and overlap - should be the same used in training the model
num_fft = blue_a_conf['num_fft']
hop_length = int(num_fft * (1 - fft_overlap)) 
call_duration_secs = blue_a_conf['duration_secs'] 
secs_per_frame = hop_length / sample_rate

# the axis to blur during spectrogram generation; freq or time or empty for no blurring
# should be the same as used in training
blur_axis = blue_a_conf['blur_axis'] 

# Block 20x the length of a call window
block_length = int(20*call_duration_secs/secs_per_frame)
 
# Overlap window for prediction by 75% 
pred_overlap = .75
overlap = int(call_duration_secs*pred_overlap/secs_per_frame)
 
window_size = call_duration_secs/secs_per_frame
step_size = window_size - overlap
num_segments = int(block_length/step_size)

freq_min = blue_a_conf['low_freq']
freq_max =  blue_a_conf['high_freq']

# PCEN parameters
pcen_gain = conf.PCEN_GAIN
pcen_bias = conf.PCEN_BIAS
pcen_tc = conf.PCEN_TIME_CONSTANT
num_mels = blue_a_conf['num_mels']

def stream_init():
    """
    Utility function to reinialize the stream
    """
    return librosa.stream(wav_filename, block_length=block_length,
                            frame_length=num_fft,
                            hop_length=hop_length,
                            mono=True)

# Striding

After computing the spectrogram one could simply run a non-overlapping window across the spectrogram, but that could miss a call if it landed on the boundary of a window.  To remedy this, striding the spectrogram with overlapping segments is needed which is simplified with the method below.

In [None]:
def make_views(arr, win_size, step_size, writeable = False):
  """
  # Credit to Kevin Urban's blog for the code below to simplify striding
  https://krbnite.github.io/Memory-Efficient-Windowing-of-Time-Series-Data-in-Python-3-Memory-Strides-in-Pandas/
  arr: any 2D array whose columns are distinct variables and 
    rows are data records at some timestamp t
  win_size: size of data window (given in data points along record/time axis)
  step_size: size of window step (given in data point along record/time axis)
  writable: if True, elements can be modified in new data structure, which will affect
    original array (defaults to False)
  
  Note that step_size is related to window overlap (overlap = win_size - step_size), in 
  case you think in overlaps.
  """
  
  # If DataFrame, use only underlying NumPy array
  if type(arr) == type(pd.DataFrame()):
    arr = arr.values
  
  # Compute Shape Parameter for as_strided
  n_records = arr.shape[0]
  n_columns = arr.shape[1]
  remainder = (n_records - win_size) % step_size 
  # Note  - bug fix here - add 2 not 1 as in the blog
  num_windows = 2 + int((n_records - win_size - remainder) / step_size) 
  shape = (num_windows, win_size, n_columns)
  
  # Compute Strides Parameter for as_strided
  next_win = step_size * arr.strides[0]
  next_row, next_col = arr.strides
  strides = (next_win, next_row, next_col)
    
  print(f'shape {shape} strides {strides}')

  new_view_structure = stride_tricks.as_strided(
    arr,
    shape = shape,
    strides = strides,
    writeable = writeable,
  )
  return new_view_structure

#  Visualize overlapping spectrograms

In [None]:
# Initialize the PCEN filter delays to steady state
zi = None
stream = stream_init()

# Initialize figure with subplots to visualize the overlap in the first block
fig2, axes = plt.subplots(21, 1)
fig2.set_size_inches(6, 12)

for y_block in stream:
    
    D = librosa.feature.melspectrogram(sklearn.preprocessing.minmax_scale(y_block, feature_range=((-2 ** 31), (2 ** 31))), 
            sr=sample_rate, center=True, hop_length=hop_length, power=1, 
                                       n_mels=num_mels, fmin=freq_min, fmax=freq_max)

    # Compute PCEN on the mel spectrum using initial delays (zi)
    P, zi =  librosa.pcen((2**31)*D, sr=sample_rate, hop_length=hop_length, gain=pcen_gain, bias=pcen_bias, 
                          time_constant=pcen_tc, zi=zi, return_zf=True)
    
    # Create strided view
    strided = make_views(P.transpose(), win_size=int(window_size), step_size=int(step_size))
     
    librosa.display.specshow(P, sr=sample_rate, fmin=freq_min, fmax=freq_max, cmap=colormap.parula_map,
                         hop_length=hop_length, x_axis='time', y_axis='mel', ax=axes[0])
    
    # Display the first 20 overlapping segments
    for i, s in enumerate(strided):
        if i > 19:
            break
        librosa.display.specshow(utils.smooth(s.transpose(), blur_axis), sr=sample_rate, fmin=freq_min, fmax=freq_max, 
                                 cmap=colormap.parula_map, hop_length=hop_length, x_axis='time', 
                                 y_axis='mel', ax=axes[i+1]) 
    break

# Download the model

In [None]:
bucket = 'pacific-sound-models'
model_filename = 'bluewhale-a-resnet50-2021-09-22-21-05-23-858.tar.gz' 

# only download if needed
if not Path(model_filename).exists(): 
    print(f'Downloading...') 
    s3.Bucket(bucket).download_file(key, model_filename)

# Alternatively, it can be downloaded directly in SageMaker with
# !aws s3 cp s3://{bucket}/{key} . 
 
print(f'Uncompressing')
!tar -xf {model_filename}
print(f'Done')

# Load the model weights

In [None]:
model = tf.keras.models.load_model("1")

In [None]:
config = json.load(open('1/config.json'))
image_mean = np.asarray(config["image_mean"])
image_std = np.asarray(config["image_std"])
print(f"Labels {config['classes']}")
print(f"Training image mean: {image_mean}")
print(f"Training image std: {image_std}")

# Run the classifier over each optimized segment

Here, we take the pcen computed segment, further preprocess it in the same manner the training data was preprocessed. This preprocessing applies the same colormap, smooths the image in the frequency domain, then denoises the colorized spectrogram in the color domain.

In [None]:
zi = None
stream = stream_init()
batch_size = 1
secs = 0

df = pd.DataFrame(columns=["time_secs", "score_baf", "score_bat"]) 

start_time = timeit.default_timer()
for y_block in stream:
                          
    D = librosa.feature.melspectrogram(sklearn.preprocessing.minmax_scale(y_block, feature_range=(-2 ** 31, 2 ** 31)), 
                                       sr = sample_rate, 
                                       center = False, 
                                       hop_length = hop_length, 
                                       power = 1, 
                                       n_mels = num_mels, 
                                       fmin = freq_min, 
                                       fmax = freq_max)
 
    # Compute PCEN on the mel spectrum using initial delays (zi)
    P, zi =  librosa.pcen((2**31)*D, 
                          sr = sample_rate, 
                          hop_length = hop_length, 
                          gain = pcen_gain, 
                          bias = pcen_bias, 
                          time_constant = pcen_tc, 
                          zi = zi, 
                          return_zf = True)
          
    # Create strided view
    strided = make_views(P.transpose(), win_size=int(window_size), step_size=int(step_size)) 
    
    for i, s in enumerate(strided): 
        image_path = Path(f'block{y_block[0]}_stride{i}.jpg')
        
        strided_smoothed = utils.smooth(s.transpose(), blur_axis)
        strided_colored = utils.colorizeDenoise(strided_smoothed, image_path)
        image_bgr = cv2.imread(image_path.as_posix())
        image = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

        # normalize with the same parameters used in training
        image_float = np.asarray(image).astype('float32')
        image_float = image_float / 255.0
        image_float = (image_float - image_mean) / image_std

        image = np.concatenate([image_float[np.newaxis, :, :]] * batch_size)
        tensor_out = model(image)
        score_baf, score_bat = tensor_out.numpy()[0]
        
        print(f'Processing segment {i} bat {score_bat} baf {score_baf} start_time {secs}')
        secs += step_size*secs_per_frame
            
        df = df.append({'time_secs': secs, 'score_baf': score_baf, 'score_bat': score_bat}, ignore_index=True)
        
        # uncomment the following line to save the classification image along with an encoded score in the filename
        # this can be useful for visual browsing
        # shutil.copy2(image_path, f'block{y_block[0]}_stride{i}_{int(score_bat*100):02}.jpg')
        
        # remove the image 
        image_path.unlink()

total_seconds = timeit.default_timer() - start_time 
print(f'Done. Processed {nsec} seconds of data in {total_seconds} seconds')

In [None]:
df

## Plot model scores

Here, we plot the true scores which could be further processed with, e.g. a median filter, then a suitable threshold could be chosen to compute a time series. 

In [None]:
fig = plt.figure(figsize=(32, 8))
gs = gridspec.GridSpec(2, 1, height_ratios=[1, 1])
 
D = librosa.feature.melspectrogram(sklearn.preprocessing.minmax_scale(samples, feature_range=(-2 ** 31, 2 ** 31)), 
                                   sr = sample_rate, 
                                   center = False, 
                                   hop_length = hop_length, 
                                   power = 1, 
                                   n_mels = num_mels, 
                                   fmin = freq_min, 
                                   fmax = freq_max)
 
P =  librosa.pcen((2**31)*D, 
                  sr = sample_rate, 
                  hop_length = hop_length, 
                  gain = pcen_gain, 
                  bias = pcen_bias, 
                  time_constant = pcen_tc)
 
plt.subplot(gs[0])
librosa.display.specshow(utils.smooth(P, blur_axis), 
                         sr = sample_rate, 
                         fmin = freq_min, 
                         fmax = freq_max, 
                         cmap = colormap.parula_map, 
                         hop_length = hop_length, 
                         x_axis='time', y_axis='mel')
 
plt.subplot(gs[1])
plt.plot(df.time_secs, df.score_bat, 'o', color='black', markersize=9)
plt.ylabel('Model Blue A True Score')
plt.xlabel('Seconds')