# BirdCLEF 2021: Visual tour of some non-target sounds in the training soundscapes

When trying to identify birds by their sounds, one necessary step is to learn to ignore non-target sounds. Many birders (at a range of skill levels) have been fooled by squirrels, and I'm sure computers can be thrown off by certain non-bird sounds, as well. Non-target sounds range from natural (insects, frogs, rain) to artificial (cars, planes). And they range from sounds that could readily be confused with birds (chipmunks, squirrels) to those that are more likely to simply obscure certain birds sounds (heavy traffic). Even birds themselves can cause confusion; for example, domestic birds such as *Gallus gallus domesticus* (the chicken) are generally considered non-target. See [eBird guidelines on domestic birds](https://support.ebird.org/en/support/solutions/articles/48000795623-ebird-rules-and-best-practices). 

I've listened to many of the training soundscapes, and have heard a lot of non-target sounds. I thought it would be interesting to put together a visual tour to highlight what these look like in spectrogram view.

The training soundscapes are from two locations: 
* Ithaca, New York, USA (abbreviation SSW, for Sapsucker Woods at the Cornell Lab of Ornithology)
* Alajuela, San Ram√≥n, Costa Rica (abbreviation COR)

Disclaimer: These are my interpretations of what I'm hearing (and seeing) based on experience birding and listening,  primarily in the United States. Alternative interpretations are welcome; I'll update with corrections as needed.  

## Notes on the spectrograms
I'm using linear-scaled spectrograms here. Note that not all plots have the same y-axis scale; for situations without much going on in the high-frequency range, I've change the maximum y value. All of these show 15 second clips, and I've noted what part of the original they come from. 

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import librosa
import librosa.display
import matplotlib.pyplot as plt
import os
import warnings
warnings.filterwarnings(action='ignore')

# Rain
Location: Ithaca (SSW). Mulitple recordings are from July 1, 2017, which seems to have been a wet day. The first example is a brief downpour that rapidly intensifies and fades off. Rain shows up on the spectrogram as a series of vertical streaks across a wide range of frequencies. This particular example happens not to have birds in it, but one should  not assume that birds simply go quiet during the rain; in fact, sometimes they are very active. See the next example for birds plus rain.

In [None]:
audio_path = '../input/birdclef-2021/train_soundscapes/26709_SSW_20170701.ogg'
sig, rate = librosa.load(audio_path, sr=32000, offset=155, duration=15)
spec = librosa.stft(sig)
spec_db = librosa.amplitude_to_db(spec, ref=np.max)
# Plot the spectrogram
plt.figure(figsize=(15, 5))
librosa.display.specshow(spec_db, 
                         sr=32000, 
                         x_axis='time', 
                         y_axis='hz', 
                         cmap=plt.get_cmap('viridis'))
plt.locator_params(axis="x", nbins=16)
plt.ylim((0,12000))
plt.title("Rain in Ithaca, 26709_SSW_20170701, 2:35-2:55 second mark of the original recording")

## Rain, with birds
Sapsucker Woods, again. Here's another example of rain, this time with bird sounds. Viewed this way, the streaks of the rain considerably degrade the quality of the parts of the spectrogram that actually represent bird vocalizations. 

In [None]:
audio_path = '../input/birdclef-2021/train_soundscapes/2782_SSW_20170701.ogg'
sig, rate = librosa.load(audio_path, sr=32000, offset=0, duration=15)
spec = librosa.stft(sig)
spec_db = librosa.amplitude_to_db(spec, ref=np.max)
# Plot the spectrogram
plt.figure(figsize=(15, 5))
librosa.display.specshow(spec_db, 
                         sr=32000, 
                         x_axis='time', 
                         y_axis='hz', 
                         cmap=plt.get_cmap('viridis'))
plt.locator_params(axis="x", nbins=16)
plt.ylim((0,12000))
plt.title("Rain in Ithaca, 2782_SSW_20170701, 0:00-0:15 second mark of the original recording")

# Traffic

Ithaca is indeed gorges (as their tourist slogan says), but it also has a lot of traffic. This particular recording has a steady drone of traffic in the background. The traffic shows up as low frequency sounds, below 2000 Hz. The intermittent horizontal lines between about 3000 and 4000 Hz are an actual bird, a singing Black-capped Chickadee. Like the chickadee, many birds vocalize at frequencies higher than the background traffic. However, background traffic would overlap the vocalization frequency range of certain low-pitched birds (Mourning Doves or Great Horned Owls, for example--both of which are part of this competition, but not this specific recording). 

The downward diagnoal line at the 12 second mark on the image (2:32 of the original) is, I believe, a genuine bird, though very faint (and perhaps not definitively identifiable); it does not show up in the train_soundscape_labels.csv. 

In [None]:
audio_path = '../input/birdclef-2021/train_soundscapes/10534_SSW_20170429.ogg'
sig, rate = librosa.load(audio_path, sr=32000, offset=140, duration=15)
spec = librosa.stft(sig)
spec_db = librosa.amplitude_to_db(spec, ref=np.max)
# Plot the spectrogram
plt.figure(figsize=(15, 5))
librosa.display.specshow(spec_db, 
                         sr=32000, 
                         x_axis='time', 
                         y_axis='hz', 
                         cmap=plt.get_cmap('viridis'))
plt.locator_params(axis="x", nbins=16)
plt.ylim((0,6000))
plt.title("Traffic in Ithaca, 10534_SSW_20170429, 2:20-2:35 second mark of the original recording")

## Honking (car, not geese)

Location: Costa Rica. Seems to be a car horn, activated repeatedly, showing up here between the 1 and 11 second marks, at frequencies below 4000 Hz. I'm not certain what the persistent sounds are at around 5000 Hz; insects would be my first guess.  

In [None]:
audio_path = '../input/birdclef-2021/train_soundscapes/50878_COR_20191004.ogg'
sig, rate = librosa.load(audio_path, sr=32000, offset=455, duration=15)
spec = librosa.stft(sig)
spec_db = librosa.amplitude_to_db(spec, ref=np.max)
# Plot the spectrogram
plt.figure(figsize=(15, 5))
librosa.display.specshow(spec_db, 
                         sr=32000, 
                         x_axis='time', 
                         y_axis='hz', 
                         cmap=plt.get_cmap('viridis'))
plt.locator_params(axis="x", nbins=16)
plt.ylim((0,8000))
plt.title("Honking in Costa Rica, 50878_COR_20191004, 7:35-7:50 second mark of the original recording")

# Rooster and dogs and insects, oh my!

Costa Rica: This is not a deep rainforest site, but rather a human-inhabited rural landscape. 
1. Rooster: Below 2000 Hz and between approximately the 2 & 4 second marks.
2. Dog(s): Below 2000 Hz and between the 8 & 12 second marks. (Not shown on this spectorgram, but in the same recording, between 1:24 and 1:29, there are two different dogs, first one with a deeper voice, followed by a slightly higher pitched yappy one. )
3. Insects: Two species, one at about 3000 Hz and one at about 4000 Hz.

Mystery: Some sounds are showing up just above 6000 Hz. My ear can't pick them out. Any ideas?  

In [None]:
audio_path = '../input/birdclef-2021/train_soundscapes/7019_COR_20190904.ogg'
sig, rate = librosa.load(audio_path, sr=32000, offset=389, duration=15)
spec = librosa.stft(sig)
spec_db = librosa.amplitude_to_db(spec, ref=np.max)
# Plot the spectrogram
plt.figure(figsize=(15, 5))
librosa.display.specshow(spec_db, 
                         sr=32000, 
                         x_axis='time', 
                         y_axis='hz',
                         cmap=plt.get_cmap('viridis'))
plt.locator_params(axis="x", nbins=16)
plt.ylim((0,8000))
plt.title("Costa Rica, 7019_COR_20190904, 6:29-6:44 second mark of the original recording")

# Flowing water; human cough

Location: Costa Rica. This sounds like it's next to a gurgling stream, which I interpret as showing up as a jumble of sound below 2000 Hz. 

At the 1 second mark in the view below, there's a human cough (showning up across a broad frequency range), followed shortly thereafter by a throat clearing. 

There's also a genuine bird (a Rufous-breasted Wren, rubwre1, according to the metadata), one with a nice, loud whistle. Often, it seems that birds that make use of stream-side habitat are loud vocalists. (The Louisiana Waterthrush I hear as I'm writing this is another such example)

I'm not sure about the bright line at 6000 Hz. Insects again? Thoughts, anyone? 

In [None]:
audio_path = '../input/birdclef-2021/train_soundscapes/11254_COR_20190904.ogg'
sig, rate = librosa.load(audio_path, sr=32000, offset=16, duration=15)
spec = librosa.stft(sig)
spec_db = librosa.amplitude_to_db(spec, ref=np.max)
# Plot the spectrogram
plt.figure(figsize=(15, 5))
librosa.display.specshow(spec_db, 
                         sr=32000, 
                         x_axis='time', 
                         y_axis='hz',
                         cmap=plt.get_cmap('viridis'))
plt.locator_params(axis="x", nbins=16)
plt.ylim((0,8000))
plt.title("Costa Rica, 11254_COR_20190904, 0:16-0:31 second mark of the original recording")

# Frogs (?)

Location: Costa Rica. This is out of my home turf, so I'm speculating that the dominant sound here (around 5000 Hz) could be frog calls. As stated earlier, I'm open to correction. The three pulses in this image at about 4000 Hz sound like insects to me.  

In [None]:
audio_path = '../input/birdclef-2021/train_soundscapes/7954_COR_20190923.ogg'
sig, rate = librosa.load(audio_path, sr=32000, offset=0, duration=15)
spec = librosa.stft(sig)
spec_db = librosa.amplitude_to_db(spec, ref=np.max)
# Plot the spectrogram
plt.figure(figsize=(15, 8))
librosa.display.specshow(spec_db, 
                         sr=32000, 
                         x_axis='time', 
                         y_axis='hz',
                         cmap=plt.get_cmap('viridis'))
plt.locator_params(axis="x", nbins=16)
plt.ylim((0,8000))
plt.title("Costa Rica, 7954_COR_20190923, 0:00-0:15 second mark of the original recording")

# Other possible sounds
That's just a brief survey of some examples I've found (so far) in the training soundscapes. It is not comprehensive. And considering more broadly about what non-target sounds might interfere with audio recordings, one can think of a long list of possibilities. For starters: 

* Engine sounds: car, motorcycle, train, plane, helicopter, lawn mower, chain saw, ATV, generator, etc.
* Other loud artificial sounds: train whistle, tornado siren, emergency vehicle sirens, loud music (from passing vehicle or stationary source), gun shots
* Weather sounds: wind, rain, thunder, hail 
* Frogs & toads 
* Insects 
* Mammals (human voice, chipmunks, squirrels, and plenty more; list expands in the tropics to include monkeys and more)
* Domestic birds: chickens, geese, ducks, peafowl, for example
* Falling trees: If a tree falls in the woods, and audio equipment records it, will people *please* stop aruging about whether it made a sound?
* Sounds from interactions with the ground, such as footsteps of people or animals (good luck differentiating scratching sounds generated by a Wild Turkey or Eastern Towhee vs. a squirrel or raccoon)
* Miscellaneous: Birders using playback (hopefully not, but possible!), snapping electric fences (mostly a problem I encounter when recording in my home habitat)


## On the bright side...
We won't have to differentiate between chainsaws and chainsaw-mimicing birds in this competition. Check out the Superb Lyrebird from Australia in [my absolute favorite David Attenborough clip on YouTube](https://www.youtube.com/watch?v=mSB71jNq-yQ). 


# Acknowledgments
The following resources have been expecially helpful to me for getting started with audio in Python:

[BirdCLEF 2021 Processing Audio Data](https://www.kaggle.com/stefankahl/birdclef2021-processing-audio-data)

[Valerio Velardo's Audio Signal Processing for Machine Learning on YouTube](https://www.youtube.com/playlist?list=PL-wATfeyAMNqIee7cH3q1bh4QJFAaeNv0)

# License

Copyright 2021 J.M. Reuter

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
