<img src="https://coubsecure-s.akamaihd.net/get/b90/p/coub/simple/cw_timeline_pic/18b7e7583a5/ff29f23bd5ad52626a2d2/big_1411093836_1393413548_image.jpg" width="700">

<br>
The data page suggests we take into account where birds live because birds in different areas may make different sounds. This notebook attempts to do the following things:

 - See where different species are found.
 - Compare calls for migrating species.
 - Compare calls for geo-diverse species.
 
For the samples I examine here, there is evidence that birds of the same species sound different depending on where they are. These features may prove useful:
 - Bird location
 - Seasonal range
 - Harmonic amplitude and percussive amplitude
 - Call intervals (top-to-top)
 
Below are two summary plots that appear later in the notebook. There are also audio clips and plots for individual calls.

In [None]:
%%HTML

<table><tr>
<td> <img src="https://i.imgur.com/Fe96nJK.png" width=500> </td>
<td> <img src="https://i.imgur.com/3fPbUj6.png" width=400> </td>
</tr></table>

In [None]:
# ! conda update -n base -c defaults -y conda
! conda install -c conda-forge -y ffmpeg # librosa

In [None]:
import re
import certifi
import urllib3
from pathlib import Path

from scipy import signal
from dask import bag, diagnostics
from IPython.display import Audio, HTML
import librosa
import numpy as np
import pandas as pd
import colorcet as cc
import geoviews as gv
import holoviews as hv
from holoviews.operation.datashader import datashade, dynspread
hv.extension('bokeh')

## Overview

There are over 21,000 birds from 264 species represented in the metadata. Around 300 of the birds have no location specified.

In [None]:
train = pd.read_csv('../input/birdsong-recognition/train.csv')
print(f"{train.ebird_code.nunique()} species, {len(train)} birds.")
print("subdirectories and files")
!ls -d ../input/birdsong-recognition/train_audio/* |wc

In [None]:
train = train.assign(latitude = pd.to_numeric(train.latitude, 
                                              errors='coerce'),
                     longitude = pd.to_numeric(train.longitude, 
                                               errors='coerce'),
                     date = pd.to_datetime(train.date, format="%Y-%m-%d",
                                           errors='coerce'),
                     month = lambda x: x.date.dt.month) \
             .dropna(subset=['latitude', 'longitude', 'date'])
    
print(len(train), train.species.nunique())

It's really easy to make interactive plots with the [Holoviz](https://holoviz.org/) library. One of the packages, Geoviews, is designed for geographic plots. It's scalable up to hundreds of millions of dots on a regular laptop. The only drawback is you need to have the notebook running live (as in edit mode) to get the full effects of zooming and datashading.

Brighter spots indicate higher bird density.

In [None]:
%%opts RGB {+axiswise} [width=650 height=620 xaxis=None yaxis=None] (alpha=0.3)

# Just 4 lines
points_trn = gv.Points(train, kdims=['longitude', 'latitude'], vdims='species')
spots_trn = datashade(points_trn, cmap=cc.kbc, normalization='eq_hist') 
tiles = gv.tile_sources.CartoLight
tiles*dynspread(spots_trn, threshold=1.0)

The birds can be grouped by genera, which are the first names of the species' scientific names. The top 21 genera by count give us 37% of the birds.

In [None]:
genera = train.sci_name.str.extract('([a-zA-Z]+ )', expand=False).value_counts(normalize=True).cumsum()
genera.head(21)

## Species and migration patterns

Here is where these birds have been spotted. Not all birds migrate, but those who do tend to move to where the warm weather is. ***Blue dots indicate March through August, orange dots indicate September through February.***

In [None]:
%%opts Points [width=250 height=220 xaxis=None yaxis=None] (alpha=0.3)


def plot_birds(train_part):
    points_trn = gv.Points(train_part, kdims=['longitude', 'latitude'], 
                           vdims='species')
    tiles = gv.tile_sources.CartoLight
    chart = tiles*points_trn
    return chart

    
layout_list = []
for type in genera.index[:21]:
    train_part = train[train.sci_name.str.contains(type)].copy()
    species_count = train_part.species.nunique()
    title = f"{type} - ({species_count} species)"
    
    mar_aug = train_part[train_part.month.between(3,8)]
    sep_feb = train_part[(train_part.month < 3) | (train_part.month > 8)]
    if not mar_aug.empty:
        p1 = plot_birds(mar_aug)
        if not sep_feb.empty:
            p2 = plot_birds(sep_feb)
            layout_list.append((p1*p2).relabel(title))
        else: 
            layout_list.append(p1.relabel(title))
    elif not sep_feb.empty:
        p1 = p2 = plot_birds(sep_feb)  # force color consistency
        layout_list.append((p1*p2).relabel(title))
    
layout = hv.Layout(layout_list).cols(3)
display(layout)

Here are the 3 species making up the genus Contopas. All three appear to migrate.

In [None]:
%%opts Points [width=250 height=220 xaxis=None yaxis=None] (alpha=0.3)

contopas = train.loc[train.sci_name.str.contains('Contopus'), 'species'].unique()
layout_list = []
for c in contopas:
    train_part = train[train.species == c].copy()
    mar_aug = train_part[train_part.month.between(3,8)]
    sep_feb = train_part[(train_part.month < 3) | (train_part.month > 8)]
    p1 = plot_birds(mar_aug)
    p2 = plot_birds(sep_feb)
    layout_list.append((p1*p2).relabel(train_part.iloc[0,9]))
    
hv.Layout(layout_list).cols(3)    

## Comparison of calls during migration


The calls below are from one of the migrating Contopii, the Western Wood Pewee, aka Contopus sordidulus. We can look at summer calls vs. winter calls.

<img src="https://upload.wikimedia.org/wikipedia/commons/9/96/Contopus_sordidulus_1.jpg" width="400">


In [None]:
def show_sounds(bird, filename):
    path = f"../input/birdsong-recognition/train_audio/{bird}/{filename}"
    display(Audio(path))

    y_, sr = librosa.load(path)
    y_harm, y_perc = librosa.effects.hpss(y_[:100_000])
    x_ = np.arange(100_000)/22050
    xy_harm = np.stack((x_,y_harm), axis=1)
    xy_perc = np.stack((x_,y_perc), axis=1)
    
    gram = hv.Curve(xy_harm).options(line_alpha=0.4) * hv.Curve(xy_perc).options(line_alpha=0.4)
    display(gram.options(width=900, height=200))

In [None]:
display(HTML('<h3 style="color:dodgerblue">Summer (North America)</h3>'))

summer_files = train_part.loc[train_part.month.between(3,8), 'filename'].tolist()
for filename in summer_files[:3]:
    show_sounds("wewpew", filename)


In [None]:
display(HTML('<h3 style="color:red">Winter (North America)</h3>'))

winter_files = train_part.loc[(train_part.month <= 3) | (train_part.month > 8), 'filename'].tolist()
for filename in winter_files[3:6]:
    show_sounds("wewpew", filename)


At first I didn't hear much difference between the two seasons compared to differences within seasons. But you see from the waveplots that the harmonic component of the call is more pronounced in the winter. Now when I listen to the first summer call and the first winter call I can hear the difference. It's like the bird is on vacation in South America and having fun!

The plot below shows a comparison of all the birds in the dataset.The winter birds generally have a higher harmonic component in their call. 

In [None]:
def max_amp(filename, bird):
    path = f"../input/birdsong-recognition/train_audio/{bird}/{filename}"
    y, _ = librosa.load(path)
    y_harm, y_perc = librosa.effects.hpss(y[:100_000])
    return y_harm.max(), y_perc.max()
   
# Use dask for multiprocessing
def get_points(files, bird):
    file_bag = bag.from_sequence(files).map(max_amp, bird)
    with diagnostics.ProgressBar():
        max_list = file_bag.compute()
    return np.array(max_list)

summer_points = get_points(summer_files, "wewpew")
winter_points = get_points(winter_files, "wewpew")

In [None]:
metric = "Relative Harmonicness of the Western Pewee"
s = hv.BoxWhisker(summer_points[:,0]/summer_points[:,1], label="Summer").options(box_fill_color="orangered")
w = hv.BoxWhisker(winter_points[:,0]/winter_points[:,1], label="Winter").options(box_fill_color="dodgerblue")
(s*w).options(width=500, height=300,title_format=metric)

## Bird dialects

Here I'll see if birds from different countries have different accents. The genus Corvus includes crows, ravens, and rooks - those classic black birds found across the world.

The Northern Raven, aka Corvus corax, is a good example of a bird who lives both in Europe and North America. Ravens are rather large birds and popular stars in science fiction and tales of horror.

<img src="https://res-1.cloudinary.com/ebirdr/image/upload/s--uDfeyPbI--/f_auto,q_auto,t_full/3375-common-raven.jpg" width="400">




In [None]:
%%opts Points [width=350 height=320 xaxis=None yaxis=None] (alpha=0.4)

corvus = "Northern Raven"
train_part = train[train.species == corvus].copy()
mar_aug = train_part[train_part.month.between(3,8)]
sep_feb = train_part[(train_part.month < 3) | (train_part.month > 8)]
p1 = plot_birds(mar_aug)
p2 = plot_birds(sep_feb)
(p1*p2).relabel(train_part.iloc[0,9])


Let's compare sounds across the continents.

In [None]:
display(HTML('<h3 style="color:dodgerblue">American Raven</h3>'))

america_files = train_part.loc[train_part.longitude < -50, 'filename'].tolist()
for filename in america_files[1:4]:
    show_sounds("comrav", filename)

In [None]:
display(HTML('<h3 style="color:dodgerblue">European Raven</h3>'))

europe_files = train_part.loc[train_part.longitude > -50, 'filename'].tolist()
for filename in europe_files[8:11]:
    show_sounds("comrav", filename)  
    


The harmonic and percussive components seem mostly consistent across continents, except for a couple of the european birds which are more sing-songy. One thing you might notice is that the American birds often have evenly spaced calls, whereas the European birds have more variation in the interval. 

The plot below shows intervals for the samples above. Looking across all raven calls has challenges due to the varying nature of the recordings. The challenges can be overcome by using a more complex function.


In [None]:
def peak_deviation(filename, bird):
    path = f"../input/birdsong-recognition/train_audio/{bird}/{filename}"
    y, _ = librosa.load(path)
    _, y_perc = librosa.effects.hpss(y[:100_000])
    peaks, _ = signal.find_peaks(y_perc, prominence=0.6*y_perc.max(), distance=3000)
    if len(peaks) > 2:
        intervals = np.diff(peaks[:5])
        std = np.std(intervals[intervals>3000])
    else:
        std = 0
    return std


# Use dask for multiprocessing
def get_var(files, bird):
    file_bag = bag.from_sequence(files).map(peak_deviation, bird)
    with diagnostics.ProgressBar():
        var_list = file_bag.compute(scheduler="threads")
    return np.array(var_list)

In [None]:
americans = get_var(america_files[1:4], "comrav")
europeans = get_var(europe_files[8:12], "comrav")

In [None]:
%%opts BoxWhisker [width=400]

a = hv.BoxWhisker(americans[americans!=0]/22050, label="American Raven").options(box_fill_color="red")
e = hv.BoxWhisker(europeans[europeans!=0]/22050, label="European Raven").options(box_fill_color="royalblue")
(a*e).relabel("StDev of Cawing Interval")


## Closing
Overall this seems to be a very challenging task. These features may prove useful:
 - Bird location
 - Seasonal range
 - Harmonic amplitude and percussive amplitude
 - Call intervals between peaks
