[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1yHECdcu9zydeXmVU5r7smMPWIOE6kXAE)

# Echolocation Segment Extraction 





In [None]:
import pandas as pd

In [None]:
# datetime packages
from datetime import timezone
from datetime import datetime, timedelta

In [None]:
# orca_hls_utils is a small package to facilitate the file extraction
# !pip install orca_hls_utils

In [None]:
# this is a version which has an extra option to overwrite existing files which is handy while testing
!pip install git+git://github.com/orcasound/orca-hls-utils

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+git://github.com/orcasound/orca-hls-utils
  Cloning git://github.com/orcasound/orca-hls-utils to /tmp/pip-req-build-dvq4q_yv
  Running command git clone -q git://github.com/orcasound/orca-hls-utils /tmp/pip-req-build-dvq4q_yv
  fatal: unable to connect to github.com:
  github.com[0: 140.82.114.4]: errno=Connection timed out

[31mERROR: Command errored out with exit status 128: git clone -q git://github.com/orcasound/orca-hls-utils /tmp/pip-req-build-dvq4q_yv Check the logs for full command output.[0m


In [None]:
import orca_hls_utils as ohu

ModuleNotFoundError: ignored

In [None]:
from orca_hls_utils.DateRangeHLSStream import DateRangeHLSStream

### Reading Human Annotations

We will extract time stamps for echolocation from human annotations within the Orcasound website. The human annotations are stored in a Postgres Database on Heroku. Below we access an older dump of this database stored in a publicly accessible google sheet.

In [None]:
# getting the file from google sheets
!wget  "https://docs.google.com/spreadsheets/d/1G3TxQVMUN3GIzhW7FEqLipQEK95E0GfwZE-qbnMHTy4/export?format=csv" -O human-detections.csv

In [None]:
df = pd.read_csv('human-detections.csv', parse_dates=['timestamp'])

In [None]:
df

In [None]:
# select keys relevant to echolocation
key_list = ['echolocation', 'clicks', 'clicking', 'click']

In [None]:
df_clean = df.dropna(axis=0, subset=['description'])

In [None]:
df_key = df_clean.loc[df_clean['description'].str.contains('|'.join(key_list), case=False)]

In [None]:
print('There are ' + str(df_key.shape[0]) + ' echolocation observations.')

In [None]:
df_key.timestamp

In [None]:
# dictionary for different hydrophone nodes
feed_dict = {1: 'orcasound_lab', 2: 'bush_point', 35: 'port_townsend'}

In [None]:
# setting the stream base for the first echolocaiton example
stream_base = 'https://s3-us-west-2.amazonaws.com/streaming-orcasound-net/rpi_' + feed_dict[df_key.iloc[0]['feed_id']]
stream_base

### Downloading Segments for all Observations

We will use the `DateRangeHLSStream` function of the `orca_hls_utils` package.

In [None]:
# convert to unix time
annot_unix_time = pd.to_datetime(df_key.timestamp.iloc[0]).replace(tzinfo=timezone.utc).timestamp()
annot_unix_time

In [None]:
# using DateRangeHLSStream to extract a segment 60sec before the annot_unix_time

stream = DateRangeHLSStream(stream_base = stream_base,
    polling_interval = 60,
    start_unix_time = annot_unix_time - 60,
    end_unix_time = annot_unix_time,
    wav_dir = './',
    overwrite_output=True)

In [None]:
stream.get_next_clip()

Setting some parameters for the segments to extract segments. 

In [None]:
# number of seconds to go backwards and start from
lag = 60

# polling interval
polling_interval = 60

# output_path
OUTPUT_PATH = './'

In [None]:
def download_segments(df_key, lag=60, polling_interval=60, output_path='./'):
  """ 
     download_segments is a function to downloads the segments for all observations in a data frame.
     The segments are downloaded in .wav format

     Inputs
     ------
     df_key: a pandas dataframe with observations of interest
     lag
     polling_interval
     OUTPUT_PATH: the path to store all segment files
     
  """
  for i in range(len(df_key)):
    annot_unix_time = pd.to_datetime(df_clicks.timestamp.iloc[i]).replace(tzinfo=timezone.utc).timestamp()
    stream_base = 'https://s3-us-west-2.amazonaws.com/streaming-orcasound-net/rpi_' + feed_dict[df_clicks.iloc[i]['feed_id']]
    stream = DateRangeHLSStream(stream_base = stream_base,
      polling_interval = polling_interval,
      start_unix_time = annot_unix_time - lag,
      end_unix_time = annot_unix_time,
      wav_dir = output_path,
      overwrite_output=True)
    stream.get_next_clip()


In [None]:
# downloading all segments: this will take a while
download_segments(df_key)

Found 3514 folders in all for hydrophone
Found 1 folders in date range
Downloading live1664.ts


live1664.ts: 188kB [00:00, 363kB/s]                            


Downloading live1665.ts


live1665.ts: 188kB [00:00, 344kB/s]                            


Downloading live1666.ts


live1666.ts: 188kB [00:00, 366kB/s]                            


Downloading live1667.ts


live1667.ts: 188kB [00:00, 362kB/s]                            


Downloading live1668.ts


live1668.ts: 188kB [00:00, 368kB/s]                            


Downloading live1669.ts


live1669.ts: 188kB [00:00, 359kB/s]                            


Found 12653 folders in all for hydrophone
Found 1 folders in date range
Downloading live223.ts


live223.ts: 106kB [00:00, 226kB/s]                            


Downloading live224.ts


live224.ts: 106kB [00:00, 231kB/s]                            


Downloading live225.ts


live225.ts: 106kB [00:00, 205kB/s]                            


Downloading live226.ts


live226.ts: 106kB [00:00, 243kB/s]                            


Downloading live227.ts


live227.ts: 106kB [00:00, 236kB/s]                            


Downloading live228.ts


live228.ts: 106kB [00:00, 231kB/s]                            


Found 4678 folders in all for hydrophone
Found 1 folders in date range
Downloading live954.ts


live954.ts: 106kB [00:00, 231kB/s]                            


Downloading live955.ts


live955.ts: 106kB [00:00, 201kB/s]                            


Downloading live956.ts


live956.ts: 106kB [00:00, 225kB/s]                            


Downloading live957.ts


live957.ts: 106kB [00:00, 233kB/s]                             


Downloading live958.ts


live958.ts: 106kB [00:00, 224kB/s]                            


Downloading live959.ts


live959.ts: 106kB [00:00, 236kB/s]                            


Found 12653 folders in all for hydrophone
Found 1 folders in date range
Downloading live011.ts


live011.ts: 90.1kB [00:00, 199kB/s]                            


Downloading live012.ts


live012.ts: 81.9kB [00:00, 149kB/s]                            


Downloading live013.ts


live013.ts: 90.1kB [00:00, 192kB/s]                            


Downloading live014.ts


live014.ts: 90.1kB [00:00, 178kB/s]                            


Downloading live015.ts


live015.ts: 90.1kB [00:00, 177kB/s]                            


Downloading live016.ts


live016.ts: 81.9kB [00:00, 148kB/s]                            


Found 12653 folders in all for hydrophone
Found 1 folders in date range
Downloading live007.ts


live007.ts: 90.1kB [00:00, 205kB/s]                             


Downloading live008.ts


live008.ts: 90.1kB [00:00, 194kB/s]                             


Downloading live009.ts


live009.ts: 90.1kB [00:00, 182kB/s]                            


Downloading live010.ts


live010.ts: 90.1kB [00:00, 186kB/s]                            


Downloading live011.ts


live011.ts: 90.1kB [00:00, 217kB/s]                            


Downloading live012.ts


live012.ts: 81.9kB [00:00, 174kB/s]                            


Found 4678 folders in all for hydrophone
Found 1 folders in date range
Downloading live1033.ts


live1033.ts: 106kB [00:00, 227kB/s]                            


Downloading live1034.ts


live1034.ts: 106kB [00:00, 224kB/s]                            


Downloading live1035.ts


live1035.ts: 106kB [00:00, 232kB/s]                             


Downloading live1036.ts


live1036.ts: 106kB [00:00, 232kB/s]                            


Downloading live1037.ts


live1037.ts: 106kB [00:00, 233kB/s]                            


Downloading live1038.ts


live1038.ts: 106kB [00:00, 227kB/s]                            


Found 4678 folders in all for hydrophone
Found 1 folders in date range
Downloading live1850.ts


live1850.ts: 106kB [00:00, 222kB/s]                            


Downloading live1851.ts


live1851.ts: 106kB [00:00, 224kB/s]                            


Downloading live1852.ts


live1852.ts: 106kB [00:00, 236kB/s]                            


Downloading live1853.ts


live1853.ts: 106kB [00:00, 245kB/s]                             


Downloading live1854.ts


live1854.ts: 106kB [00:00, 222kB/s]                            


Downloading live1855.ts


live1855.ts: 106kB [00:00, 231kB/s]                            


Found 4678 folders in all for hydrophone
Found 1 folders in date range
Downloading live1852.ts


live1852.ts: 106kB [00:00, 244kB/s]                            


Downloading live1853.ts


live1853.ts: 106kB [00:00, 224kB/s]                            


Downloading live1854.ts


live1854.ts: 106kB [00:00, 235kB/s]                            


Downloading live1855.ts


live1855.ts: 106kB [00:00, 242kB/s]                            


Downloading live1856.ts


live1856.ts: 106kB [00:00, 215kB/s]                            


Downloading live1857.ts


live1857.ts: 106kB [00:00, 222kB/s]                             


Found 4678 folders in all for hydrophone
Found 1 folders in date range
Downloading live1858.ts


live1858.ts: 106kB [00:00, 237kB/s]                            


Downloading live1859.ts


live1859.ts: 106kB [00:00, 224kB/s]                            


Downloading live1860.ts


live1860.ts: 106kB [00:00, 234kB/s]                            


Downloading live1861.ts


live1861.ts: 106kB [00:00, 230kB/s]                            


Downloading live1862.ts


live1862.ts: 106kB [00:00, 226kB/s]                            


Downloading live1863.ts


live1863.ts: 106kB [00:00, 227kB/s]                             


KeyboardInterrupt: ignored

Preparing `.wav`'s for export.

In [None]:
mkdir wavs

In [None]:
mv *.wav wavs/

In [None]:
ls

human-detections.csv  rpi-port-townsend_2020_07_03_17_35_18.wav  [0m[01;34msample_data[0m/


In [None]:
!zip -r wavs.zip wavs

  adding: wavs/ (stored 0%)
  adding: wavs/rpi-port-townsend_2020_07_04_15_56_55.wav (deflated 12%)
  adding: wavs/rpi-orcasound-lab_2020_07_05_23_02_37.wav (deflated 51%)
  adding: wavs/rpi-bush-point_2020_06_18_15_17_41.wav (deflated 48%)
  adding: wavs/rpi-bush-point_2020_07_11_14_35_04.wav (deflated 56%)
  adding: wavs/rpi-bush-point_2020_09_01_17_23_25.wav (deflated 47%)
  adding: wavs/rpi-orcasound-lab_2020_10_20_05_01_33.wav (deflated 15%)
  adding: wavs/rpi-bush-point_2020_11_22_18_19_02.wav (deflated 18%)
  adding: wavs/rpi-port-townsend_2020_07_07_12_30_30.wav (deflated 22%)
  adding: wavs/rpi-bush-point_2020_07_11_17_01_13.wav (deflated 15%)
  adding: wavs/rpi-port-townsend_2020_07_03_17_35_18.wav (deflated 15%)
  adding: wavs/rpi-bush-point_2020_09_28_06_00_29.wav (deflated 14%)
  adding: wavs/rpi-orcasound-lab_2020_10_20_05_10_33.wav (deflated 15%)
  adding: wavs/rpi-orcasound-lab_2020_07_05_22_53_26.wav (deflated 45%)
  adding: wavs/rpi-port-townsend_2020_06_02_08_34_02.w