<a href="https://colab.research.google.com/github/sedwardsmarsh/Marine-Mammal-Classifier/blob/master/Marine_Mammal_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Marine Mammal Classifier**


*   source for audio data: Watkins Marine Mammal Sound Database, Woods Hole Oceanographic Institution: https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm
*   thanks to Todd Hayton for the python tutorial *Scraping by Example - Iterating through Select Items With Mechanize*: http://toddhayton.com/2015/01/09/scraping-by-example-ntu-edu/






Before running anything, you need to tell Colab that you are interested in using a GPU. You can do this by clicking on the ‘Runtime’ tab and selecting ‘Change runtime type’. A pop-up window will open up with a drop-down menu. Select ‘GPU’ from the menu and click ‘Save’.

# ***make these images a lot smaller***

![Click the 'Runtime' tab above and select 'Change runtime type'](https://course.fast.ai/images/colab/03.png)

![A pop-up window will open up with a drop-down menu. Select ‘GPU’ from the menu and click ‘Save’.](https://course.fast.ai/images/colab/04.png)

# Setup the environment

In [25]:
# connect to google drive
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My\ Drive/"
data_dir = root_dir + "Colab\ Notebooks/watkins_data/"

Mounted at /content/gdrive


In [26]:
# create google drive directory to hold watkins marine mammal data
!mkdir {data_dir}

mkdir: cannot create directory ‘/content/gdrive/My’: Operation not supported
mkdir: cannot create directory ‘Drive/Colab’: No such file or directory
mkdir: cannot create directory ‘Notebooks/watkins_data/’: No such file or directory


In [0]:
# fetch the latest fast.ai version 
!curl -s https://course.fast.ai/setup/colab | bash

In [0]:
# install the latest SoX version
!apt-get install -q sox

Reading package lists...
Building dependency tree...
Reading state information...
sox is already the newest version (14.4.2-3ubuntu0.18.04.1).
0 upgraded, 0 newly installed, 0 to remove and 25 not upgraded.


# ***upload this python script to github after finalizing it.***

In [2]:
# install the latest mechanize version
!pip install mechanize

Collecting mechanize
[?25l  Downloading https://files.pythonhosted.org/packages/13/08/77368b47ba2f9e0c03f33902ed2c8e0fa83d15d81dcf7fe102b40778d810/mechanize-0.4.5-py2.py3-none-any.whl (109kB)
[K     |███                             | 10kB 25.6MB/s eta 0:00:01[K     |██████                          | 20kB 3.1MB/s eta 0:00:01[K     |█████████                       | 30kB 3.9MB/s eta 0:00:01[K     |████████████                    | 40kB 2.9MB/s eta 0:00:01[K     |███████████████                 | 51kB 3.3MB/s eta 0:00:01[K     |██████████████████              | 61kB 3.9MB/s eta 0:00:01[K     |█████████████████████           | 71kB 4.2MB/s eta 0:00:01[K     |████████████████████████        | 81kB 4.4MB/s eta 0:00:01[K     |███████████████████████████     | 92kB 4.9MB/s eta 0:00:01[K     |██████████████████████████████  | 102kB 4.9MB/s eta 0:00:01[K     |████████████████████████████████| 112kB 4.9MB/s 
Installing collected packages: mechanize
Successfully installed me

In [9]:
#!/usr/bin/env python

'''
website: https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm

special thank you to Watkins Marine Mammal Sound Database, 
Woods Hole Oceanographic Institution for making these audio recordings 
free and publicly availible.

33 options (including default "select" option) in common name drop down menu.
grab the select#getSpeciesCommon.value which is a url to each page.
'''

import sys
import signal
import mechanize 
import time

URL = 'https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm'
DELAY = 5

def sigint(signal, frame):
  sys.stderr.write('Exiting...\n')
  sys.exit(0)    

class WatkinsScraper:
    def __init__(self, url=URL, delay=DELAY):
        # initilize browser, url, delay and items array
        self.br = mechanize.Browser()
        self.url = url
        self.delay = delay
        self.items = []


    def scrape(self):
        '''
        Get the list of items in the first dropdown menu, "Common name" 
        and submit the form for each item. 
        '''
        items = self.get_items()

        for item in items:
            # Skip invalid/blank item selections
            if item.get_labels != "Select":
                continue

            results = self.submit_form(item)
            self.save_item_results(item, results)


    def get_items(self):
        '''
        Get the list of items in the first dropdown of the form
        '''
        self.br.open(self.url)
        self.br.select_form('jump1')

        # get items from submit tag 
        items = self.br.form.find_control('getSpeciesCommon').get_items()
        return items


    def submit_form(self, item):
        '''
        Submit form using selection item.name and download the audio files
        to data_dir
        '''
        max_tries = 3
        num_tries = 0

        while num_tries < max_tries:
            # loop through each item name from submit tag.
            try:
                self.br.open(self.url)
                self.br.select_form('jump1')
                self.br.form['getSpeciesCommon'] = [ item.name ]
                self.br.submit()
                break
            # unless encountering an error.
            except (mechanize.HTTPError, mechanize.URLError) as e:
                if isinstance(e,mechanize.HTTPError):
                    print(e.code)
                else:
                    print(e.reason.args)

            num_tries += 1
            time.sleep(num_tries * self.delay)

        if num_tries == max_tries:
            raise

        # return page response from server.
        return self.br.response().read()


    def save_item_results(self, item, results):
        label = ' '.join([label.text for label in item.get_labels()])
        label = '-'.join(label.split())

        # with open("%s.html" % label, 'w') as f:
        #     f.write(results)
        #     f.close()



if __name__ == '__main__':
    signal.signal(signal.SIGINT, sigint)
    scraper = WatkinsScraper()
    scraper.scrape()
    some_items = scraper.get_items()
    token = scraper.save_item_results(item=some_items[1])
    # for x in zip(some_items): 
    #     print(x)
    

Atlantic-Spotted-Dolphin
