Skip to content

Transformer based deep learning model for CS graduation project

Notifications You must be signed in to change notification settings

farrinfedra/BirdsOfIstanbul

Repository files navigation

Birds of Istanbul IOS

Shazam for birds of istanbul!

This repository contains the code for the deep learning model for Birds of Istanbul application on iOS.

logo


Table of Contents

  1. Introduction
  2. Features
  3. Model
  4. Dataset
  5. Preprocessing
  6. Results
  7. Members
  8. References

Introduction

What is Birds of Istanbul?

An iOS application for classifying bird songs developed for ornithologists, bird watcher, or those who are curious and want to explore birds in their surroundings.

app

Click to visit the repository for Swift code: Github

Features

What features does Birds of Istanbul offer?

  • You can record bird songs in the app or upload your previously recorded bird recordings and learn the species.
  • You can explore birds in your neighborhood and visualize them on the map.
  • Get to know your classified birds as well as 400 species in different regions of Türkiye.

Model

This section is about the birds of istanbul model.

All about the Birds of Istanbul Model.

Based on Audio Spectrogram transformer (AST) [1], pre-trained on 397 bird species, fine-tuned on 400 bird species from different regions of Türkiye. AST takes as an input a raw waveform of a bird song and converts it into a 128 × 100t spectrogram which is then converted into a sequence of 16 × 16 patches. These patches are fed into a linear projection layer that result in 1-dimensional patch embedding of size 768. The patch embeddings are then accompanied by their corresponding learnable positional embeddings along with a classification token (CLS) as shown in figure 6. And, then fed into a Transformer encoder with 12 layers, 12 heads and an embedding dimension of 768. Lastly, the output of the Transformer encoder for CLS token which represents the spectrogram, is fed into a linear layer and the resulting classification labels are obtained. In our case, following the same process as in the pre-trained AST on 397 bird species, the model predicts bird species in each 5 second chunk of an audio recording and the bird species with the highest score is extracted.

app2

Dataset

All bird recordings are obtained from Xeno Canto [2] website. Downloaded 335k bird recordings of 400 bird species in Türkiye and created metadata. Here are train - validation - test dataset statistics.

5 seconds Train Validation Test
No 268k 33.5k 33.5k
Yes 1.4 M 600k 300k

Preprocessing

  • Converted recordings to wav format.
  • Re-sampled to 16 kHz.
  • Split audios to 40 seconds to speed up the mel spectrogram conversion process.
  • Create metadata and checked labels with that of eBird [3].
  • Split data into train, validation and test in 80% - 10% - 10% portions, respectively.

Results

Here are some results of our model. The model is tested on two different datasets. The first test set is from Xeno Canto described in Dataset section and the second dataset is obtained from eBird [3] and contains around 7k recordings of real recordings from different regions in Türkiye.

Metric Xeno Canto eBird
F1 Micro 0.7229 0.7061
Precision Macro 0.9272 0.65
Precision Micro 0.9272 0.8549
Recall Micro 0.602 0.59

Members

  • Farrin Marouf Sofian: ML Researcher
  • Andrew Bond: DevOps
  • Kutay Eroğlu: DevOps
  • Ömer Faruk Aksoy: Full Stack iOS Development
  • Can Köz: Full Stack iOS Development
@@ Special thanks to Prof. Aykut Erdem, Prof. Bariş Akgün, Prof. Erkut Erdem, Prof. Çaglar Akçay and Burak Can Biner For their help and guidance throughout the project. @@

References

[1] Gong, Y., Chung, Y. and Glass, J., 2021. AST: Audio Spectrogram Transformer. In Interspeech.

[2] Canto Foundation, X., 2022. URL https://xeno-canto.org.

[3] eBird. 2021. eBird: An online database of bird distribution and abundance [web application]. eBird, Cornell Lab of Ornithology, Ithaca, New York. Available: http://www.ebird.org (Accessed: May 15, 2022)

[4] Swift. [Online]. Available: https://www.swift.org/ . (Accessed: May 24, 2022).

About

Transformer based deep learning model for CS graduation project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published