  ![alt text](https://www.mbari.org/wp-content/uploads/2014/11/logo-mbari-3b.png "MBARI")

  <div align="left">Copyright (c) 2021, MBARI</div>

  * Distributed under the terms of the GPL License
  * Maintainer: dcline@mbari.org
  * Authors: Danelle Cline dcline@mbari.org, John Ryan ryjo@mbari.org

# Blue A Stream Detection

This notebook demonstrates how to predict blue A calls on a spectrogram incrementally, leveraging PCEN streaming.  This is useful when an audio file is large,  or if streaming data directly from a hydrophone. This is an alternative to using BLEDs for detection. 

## Install dependencies

First, let's install dependencies and include all packages used in this tutorial. This only needs to be done once for the duration of this notebook.

In [None]:
%pip install oceansoundscape==1.1.0 --quiet

In [None]:
%matplotlib inline
import boto3
from botocore import UNSIGNED
from botocore.client import Config
import datetime
import glob
import numpy as np
from matplotlib import gridspec
import matplotlib.pyplot as plt
import base64
import soundfile as sf
import scipy
import sklearn
import shutil
import librosa as librosa
import librosa.display as display 
from librosa.display import specshow, waveplot
import tensorflow as tf
import cv2
import requests
import timeit
import json
from numpy.lib import stride_tricks
import io
import pandas as pd
import dateutil.parser
from scipy.signal import find_peaks
from pathlib import Path
from oceansoundscape.spectrogram import conf, colormap
from oceansoundscape.spectrogram.utils import ImageUtils as utils

## Local data store

Let's set the path to the data here where we want to store the data. 

In [None]:
data_path = Path.cwd().parent.parent / 'data'
if not data_path.exists():
    data_path.mkdir(parents=True)

# First, download an audio file to test

In [None]:
# First, let's download the data used in this notebook
def fetch(file_path:Path):
    """
    Utility to fetch a wav file from either a local file system or an S3 bucket
    """ 
    s3 = boto3.resource('s3',
        aws_access_key_id='',
        aws_secret_access_key='',
        config=Config(signature_version=UNSIGNED))

    if not file_path.exists():
        print('Downloading')
        s3.Bucket('emso-tsc2021-session3-eu-west-3').download_file(file_path.name, file_path.as_posix())
        print(f'Done downloading {file_path}') 
    
# wav_path = data_path / 'blue_A_stream.wav' # shorter 5 minute example
wav_path = data_path / 'MARS-20171101T000000Z-10min-2kHz.wav' 
fetch(wav_path)
samples, sample_rate = sf.read(wav_path.as_posix())
nsec = (samples.size)/sample_rate # number of seconds in vector
print(f'Read {nsec} seconds of data')

# PCEN Streaming

Here, can use the streaming IO with librosa.pcen to do dynamic per-channel energy normalization (PCEN) on a spectrogram incrementally. 

First, set up the block reader to work on audio segments at least the length of an expected call, overlapping prediction by 75% with adjacent frames.

In [None]:
# The optimum configuration for the call spectrogram generation
# is defined in the oceanscoundscape package based on extensive
# hyper parameter sweeps. These need to match those used to train the model
blue_a_conf = conf.CONF_DICT['blueA']

# this is a global for all call types
fft_overlap = conf.OVERLAP 

# fft window and overlap - should be the same used in training the model
num_fft = blue_a_conf['num_fft']
hop_length = int(num_fft * (1 - fft_overlap)) 
call_duration_secs = blue_a_conf['duration_secs'] 
secs_per_frame = hop_length / sample_rate

# the axis to blur during spectrogram generation; freq or time or empty for no blurring
# should be the same as used in training
blur_axis = blue_a_conf['blur_axis'] 

# Block 20x the length of a call window
block_length = int(20*call_duration_secs/secs_per_frame)
 
# Overlap window for prediction by 75% 
pred_overlap = .75
overlap = int(call_duration_secs*pred_overlap/secs_per_frame)
 
window_size = call_duration_secs/secs_per_frame
step_size = window_size - overlap
num_segments = int(block_length/step_size)

freq_min = blue_a_conf['low_freq']
freq_max =  blue_a_conf['high_freq']

# PCEN parameters
pcen_gain = conf.PCEN_GAIN
pcen_bias = conf.PCEN_BIAS
pcen_tc = conf.PCEN_TIME_CONSTANT
num_mels = blue_a_conf['num_mels']

def stream_init(filename):
    """
    Utility function to reinialize the stream
    """
    return librosa.stream(filename, block_length=block_length,
                            frame_length=num_fft,
                            hop_length=hop_length,
                            mono=True)

# Striding

After computing the spectrogram one could simply run a non-overlapping window across the spectrogram, but that could miss a call if it landed on the boundary of a window.  To remedy this, striding the spectrogram with overlapping segments is needed which is simplified with the method below.

In [None]:
def make_views(arr, win_size, step_size, writeable = False):
  """
  # Credit to Kevin Urban's blog for the code below to simplify striding
  https://krbnite.github.io/Memory-Efficient-Windowing-of-Time-Series-Data-in-Python-3-Memory-Strides-in-Pandas/
  arr: any 2D array whose columns are distinct variables and 
    rows are data records at some timestamp t
  win_size: size of data window (given in data points along record/time axis)
  step_size: size of window step (given in data point along record/time axis)
  writable: if True, elements can be modified in new data structure, which will affect
    original array (defaults to False)
  
  Note that step_size is related to window overlap (overlap = win_size - step_size), in 
  case you think in overlaps.
  """
  
  # If DataFrame, use only underlying NumPy array
  if type(arr) == type(pd.DataFrame()):
    arr = arr.values
  
  # Compute Shape Parameter for as_strided
  n_records = arr.shape[0]
  n_columns = arr.shape[1]
  remainder = (n_records - win_size) % step_size 
  # Note  - bug fix here - add 2 not 1 as in the blog
  num_windows = 2 + int((n_records - win_size - remainder) / step_size) 
  shape = (num_windows, win_size, n_columns)
  
  # Compute Strides Parameter for as_strided
  next_win = step_size * arr.strides[0]
  next_row, next_col = arr.strides
  strides = (next_win, next_row, next_col)
    
  # print(f'shape {shape} strides {strides}')

  new_view_structure = stride_tricks.as_strided(
    arr,
    shape = shape,
    strides = strides,
    writeable = writeable,
  )
  return new_view_structure

#  Visualize overlapping spectrograms

In [None]:
# Initialize the PCEN filter delays to steady state
zi = None
stream = stream_init(wav_path.as_posix())

# Initialize figure with subplots to visualize the overlap in the first block
fig2, axes = plt.subplots(21, 1)
fig2.set_size_inches(6, 12)

for y_block in stream:
    
    D = librosa.feature.melspectrogram(sklearn.preprocessing.minmax_scale(y_block, feature_range=((-2 ** 31), (2 ** 31))), 
            sr=sample_rate, center=True, hop_length=hop_length, power=1, 
                                       n_mels=num_mels, fmin=freq_min, fmax=freq_max)

    # Compute PCEN on the mel spectrum using initial delays (zi)
    P, zi =  librosa.pcen((2**31)*D, sr=sample_rate, hop_length=hop_length, gain=pcen_gain, bias=pcen_bias, 
                          time_constant=pcen_tc, zi=zi, return_zf=True)
    
    # Create strided view
    strided = make_views(P.transpose(), win_size=int(window_size), step_size=int(step_size))
     
    librosa.display.specshow(P, sr=sample_rate, fmin=freq_min, fmax=freq_max, cmap=colormap.parula_map,
                         hop_length=hop_length, x_axis='time', y_axis='mel', ax=axes[0])
    
    # Display the first 20 overlapping segments
    for i, s in enumerate(strided):
        if i > 19:
            break
        librosa.display.specshow(utils.smooth(s.transpose(), blur_axis), sr=sample_rate, fmin=freq_min, fmax=freq_max, 
                                 cmap=colormap.parula_map, hop_length=hop_length, x_axis='time', 
                                 y_axis='mel', ax=axes[i+1]) 
    break

# Download the model

In [None]:
bucket = 'pacific-sound-models'
model_filename = 'bluewhale-a-resnet50-2021-09-22-21-05-23-858.tar.gz' 

# only download if needed
if not Path(model_filename).exists(): 
    print(f'Downloading...') 
    s3.Bucket(bucket).download_file(key, model_filename)

# Alternatively, it can be downloaded directly in SageMaker with
# !aws s3 cp s3://{bucket}/{key} . 
 
print(f'Uncompressing')
!tar -xf {model_filename}
print(f'Done')

# Load the model weights

In [None]:
model = tf.keras.models.load_model("1")

In [None]:
config = json.load(open('1/config.json'))
image_mean = np.asarray(config["image_mean"])
image_std = np.asarray(config["image_std"])
print(f"Labels {config['classes']}")
print(f"Training image mean: {image_mean}")
print(f"Training image std: {image_std}")

# Run the classifier over each optimized segment

Here, we take the pcen computed segment, further preprocess it in the same manner the training data was preprocessed. This preprocessing applies the same colormap, smooths the image in the frequency domain, then denoises the colorized spectrogram in the color domain.

In [None]:
def process(file_path:Path, quiet=False):
    zi = None
    stream = stream_init(file_path.absolute())
    batch_size = 1
    secs = 0

    df = pd.DataFrame(columns=["date_time", "score_baf", "score_bat"]) 
    # splits out 20171101T000000Z from MARS-20171101T000000Z-2kHz.wav
    s = file_path.stem.split('-')[1] 
    # convert to a datetime object
    start_dt = datetime.datetime.strptime(s, '%Y%m%dT%H%M%SZ')

    start_time = timeit.default_timer()
    print(f'Processing {file_path}')
    for y_block in stream:

        D = librosa.feature.melspectrogram(sklearn.preprocessing.minmax_scale(y_block, feature_range=(-2 ** 31, 2 ** 31)), 
                                           sr = sample_rate, 
                                           center = False, 
                                           hop_length = hop_length, 
                                           power = 1, 
                                           n_mels = num_mels, 
                                           fmin = freq_min, 
                                           fmax = freq_max)

        # Compute PCEN on the mel spectrum using initial delays (zi)
        P, zi =  librosa.pcen((2**31)*D, 
                              sr = sample_rate, 
                              hop_length = hop_length, 
                              gain = pcen_gain, 
                              bias = pcen_bias, 
                              time_constant = pcen_tc, 
                              zi = zi, 
                              return_zf = True)

        # Create strided view
        strided = make_views(P.transpose(), win_size=int(window_size), step_size=int(step_size)) 

        for i, s in enumerate(strided): 
            image_path = Path(f'block{y_block[0]}_stride{i}.jpg')

            strided_smoothed = utils.smooth(s.transpose(), blur_axis)
            strided_colored = utils.colorizeDenoise(strided_smoothed, image_path)
            image_bgr = cv2.imread(image_path.as_posix())
            image = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

            # normalize with the same parameters used in training
            image_float = np.asarray(image).astype('float32')
            image_float = image_float / 255.0
            image_float = (image_float - image_mean) / image_std

            image = np.concatenate([image_float[np.newaxis, :, :]] * batch_size)
            tensor_out = model(image)
            score_baf, score_bat = tensor_out.numpy()[0]

            date_time = start_dt + datetime.timedelta(microseconds=int(secs*1e6))

            if not quiet:
                print(f'Processing segment {i} bat {score_bat} baf {score_baf} start_time {date_time}')
            secs += step_size*secs_per_frame

            df = df.append({'date_time': date_time, 
                            'score_baf': score_baf, 
                            'score_bat': score_bat}, 
                           ignore_index=True)

            # uncomment the following line to save the classification image along with an encoded score in the filename
            # this can be useful for visual browsing
            # shutil.copy2(image_path, f'block{y_block[0]}_stride{i}_{int(score_bat*100):02}.jpg')

            # remove the image 
            image_path.unlink()

    total_seconds = timeit.default_timer() - start_time 
    print(f' {file_path.name} done. Processed {secs} seconds of data in {total_seconds} seconds')
    
    # Save to a csv file in the same directory as this notebook for later use
    df.to_csv(Path.cwd() / f'{file_path.name}.csv') 
    
    return df
    
df = process(wav_path)

## Plot model scores

Here, we plot the true scores with a peak detector to locate the calls.  These peaks serve as a proxy for the call counts in the time-series analysis.

In [None]:
# Filter out only the higher scores
df_true = df[df['score_bat'] > 0.70]

# Find peaks at least as far apart as the typical duration of a call
call_step = round(call_duration_secs/(step_size*secs_per_frame))
peaks = find_peaks(df_true.score_bat.values, distance=call_step)

fig = plt.figure(figsize=(32, 8))
gs = gridspec.GridSpec(2, 1, height_ratios=[1, 1])
 
D = librosa.feature.melspectrogram(sklearn.preprocessing.minmax_scale(samples, feature_range=(-2 ** 31, 2 ** 31)), 
                                   sr = sample_rate, 
                                   center = False, 
                                   hop_length = hop_length, 
                                   power = 1, 
                                   n_mels = num_mels, 
                                   fmin = freq_min, 
                                   fmax = freq_max)
 
P =  librosa.pcen((2**31)*D, 
                  sr = sample_rate, 
                  hop_length = hop_length, 
                  gain = pcen_gain, 
                  bias = pcen_bias, 
                  time_constant = pcen_tc)
 
plt.subplot(gs[0])
librosa.display.specshow(utils.smooth(P, blur_axis), 
                         sr = sample_rate, 
                         fmin = freq_min, 
                         fmax = freq_max, 
                         cmap = colormap.parula_map, 
                         hop_length = hop_length, 
                         x_axis='time', y_axis='mel')
 
plt.subplot(gs[1])
plt.plot(df_true.date_time.values, df_true.score_bat, 'o', color='grey', markersize=10)
plt.plot(df_true.date_time.values[peaks[0]], df_true.score_bat.values[peaks[0]], "x", color='red', markersize=15)
plt.ylabel('Model Blue A True Score')
plt.xlabel('Seconds')

## Batch processing

Let's process one week of data, running the model across each day, finding peaks for each call.  The week of November 12 - 18, 2017 is a good week, with strong variation in the call index and no recording gaps.

In [None]:
days = range(17,20)
year = 2017
month = 11  

# for d in days:
#     filename = data_path / f'MARS-201711{d}T000000Z-2kHz.wav'
#     fetch(filename) 
#     process(filename, quiet=True)

## Combine all the results and filter only high scoring data

In [None]:
# remove the test file - we don't want to include this in the time-series
%rm MARS-20171101T000000Z-10min-2kHz.wav.csv

files = Path.cwd().glob('*.csv')
df = pd.DataFrame()
for f in files:
    print(f)
    df = df.append(pd.read_csv(f.absolute()))

In [None]:
def iso_date(date_string):
    return dateutil.parser.parse(date_string)

df['call_start'] = df.apply(lambda x: iso_date(x['date_time']), axis=1)

# Filter out only the higher scores
df_70 = df[df['score_bat'] > 0.70]
 
# Find peaks at least as far apart as the typical duration of a call
print('Finding peaks...')
call_step = round(call_duration_secs/(step_size*secs_per_frame))
peaks = find_peaks(df_70.score_bat.values, distance=call_step)
print('Done')

# Create a dataframe with the peaks
df_calls = pd.DataFrame(index=peaks[0])
df_calls['date_time'] = df_70.date_time.values[peaks[0]]
df_calls['calls'] = 1
df_calls.index = df_calls.apply(lambda x: iso_date(x['date_time']), axis=1)
df_calls = df_calls.drop(columns=['date_time'])
df_calls

### Save and plot hourly binned data for the week 

In [None]:
df_calls_hourly = df_calls.resample('H').sum()
df_calls_hourly.to_csv('HourlyCNN-12-18Nov2017')
len(df_calls_hourly)

In [None]:
fig, ax = plt.subplots(figsize=(12, 3))
 
ax.scatter(df_calls_hourly.resample('H').sum().index.values,
           df_calls_hourly.resample('H').sum()['calls'],
           color='blue')

# Set title and labels for axes
ax.set(xlabel="Date",
       ylabel="Total calls detected > 0.7)",
       title="Blue whale A Calls Hourly 12-18Nov2017")

plt.show()