Skip to content

BirdNET Pi: some theory on classification & some practical hints

ehpersonal38 edited this page Jun 9, 2023 · 19 revisions

1. How does the classification of bird sounds work in BirdNET-Pi ?

A small computer (Raspberry Pi) processes the audio signals digitised by the USB audio interface and feeds them into a pre-trained Deep Convolutional Neural Network. The results are then displayed in real time in a browser page on any digital device (smartphone, iPod, tablet, laptop, PC, etc.).

2. Who developed the Deep Convolutional Neural Network models used in BirdNET-Pi?

BirdNET (the neural network trained with several hundred thousands bird call recordings) was developed by Dr. Stefan Kahl BirdNET. The system is also available as a smartphone app. At the time of writing, there are two different BirdNET models available in the BirdNET-Pi. For both of them, the source code and different models = classifiers are freely available:

BirdNET-Lite:

  • One single model available with „global“ coverage of bird species, more than 6,000 bird species covered
  • https://github.com/kahst/BirdNET-Lite
  • On the github page there is a statement „This repository is deprecated“, because the repo is not maintained actively any more by the developer Stefan Kahl. However the MODEL is NOT „deprecated“, it´s pretty good and is one of the best bird classifiers available (see section: performance of models)

BirdNET-Analyzer:

  • Different models available, the largest and latest model v2.4 covers about 6,000 bird species
  • https://github.com/kahst/BirdNET-Analyzer
  • This repo is actively maintained and the models are in active development. However, not every recently issued model is necessarily better than older models. Their performance has to be rigorously tested

3. What are False Positives and what are False Negatives ? What is model sensitivity and model specificity ?

  • False Positive: The model says the bird is there, but it is not present in the recording
  • False Negative: the model says the bird is not there, but actually it is really audible in the recording
  • A model with high sensitivity can detect very faint/silent bird sounds. Thus, it will not miss many sounds by a specific bird. However, because it is so sensitive, many of the detections will be False detections == False Positives.
  • A model with high specificity will be very reliable, thus if a bird species is detected, that detection is often correct. However, the model will probably miss many bird sounds that are silent / faint (False Negatives), because these sounds cannot be detected with a high reliability. Thus, a model with a high specificity usually misses many bird sounds, thus has a high False Negative Rate.

4. How reliable are the bird identifications by BirdNET-Pi ?

Please always keep in mind: no automatic classification model (nor any human ornithologist) is perfect. The quality of the classifications of BirdNET's recordings is on a professional level (from my experience and also what I read in scientific papers). BirdNET has achieved good rankings in the annual competitions for the best artificial intelligence for this purpose. It is important to remember that the quality of the classifications depends on the predefined „sensitivity“ and „specificity“ of the classifier. There is no (and there cannot be) 100% accuracy. Sensitivity and specificity are inversely proportional to each other, so it is not possible to optimise both parameters at the same time. Even an experienced human ornithologist makes mistakes. The difference with BirdNET is that with BirdNET we notice these mistakes. Birds that an ornithologist misses in the field (too low sensitivity) are not recorded in any statistic. Birds that an ornithologist identifies incorrectly in the field (lack of specificity) are also very rarely recorded as errors. With BirdNET-Pi, lack of sensitivity is also difficult to quantify (no recordings are made if a bird species is overlooked). A lack of specificity is easy to prove, since recordings are available that can be checked. In this respect, a comparison between humans and machines is obvious, but very difficult to objectify (using a suitable study design). Sensitivity and specificity of BirdNET-Pi can be chosen and adjusted by each user --> see next paragraph.

5. How can I tune BirdNET-Pis´ parameters for my specific purpose ?

The "Confidence" of a "detection" indicates how high the probability is that the respective determination is correct. In the "Advanced Settings" I can set the minimum value that a detection must have in order to be included in the database. The higher this value, the more reliable the determination. This corresponds to a higher specificity of the determination. However, the sensitivity decreases at the same time.

If I want my BirdNET-Pi to be very sensitive (i.e. to have a high sensitivity), I can set this to a higher value in the "Tools --> Settings --> Advanced settings" (a value of 1.0 to 1.25 is recommended, you can set it from 0.5 to 1.5). A higher sensitivity leads to fewer "false negatives", i.e. I miss fewer birds. At the same time, however, the proportion of "false positives" increases, i.e. birds that are incorrectly identified. This is equivalent to a decrease in specificity. Sensitivity and specificity are therefore inversely proportional and cannot both be optimised at the same time. It is a decision of the user to increase sensitivity and accept a lower specificity OR to decrease sensitivity and thereby achieve a higher specificity. The sensitivity can be set with the parameter "Sigmoid sensitivity", the specificity with the parameter "Minimum Confidence". Another possibility to prevent False Positives is to adjust the „Species Occurence Frequency Threshold“ (only if you use the BirdNET-Analyzer model, see section below). The geographical coordinates of the location are very important for a reliable ID. Therefore, please always check and correct if necessary. A simple and quick calculator for the coordinates of a site can be found HERE. Remember that 1 decimal place is enough for the model.

6. How can models be compared objectively ?

You need a test data set with hundreds of validated recordings of many species. You should have at least 20 recordings of every species in your model that have been validated by a human ear. Then you play this test data set to the model and look at the confidence scores that the model produces. Then you plot the ROC curves (which show the True Positive and False Positive Rate of the classifications for different confidence threshold values) for your data set and calculate the area under these curves (AUC-value). The larger the AUC, the better the model. Unless you do this really laboreous comparison, you should not judge a models´ quality. Never use „number of detections“ or „the new model has more detections“, „the new model is more resilient against classifying police cars as Screech Owl“ as a measure of model performance. By the way, here is a paper comparing the two models „BirdNET-Lite“ and „BirdNET-Analyzer v2.1“ with each other. Beware that this comparison only holds for the few species used in the paper.

Have a look at figure 1 of that paper. It shows that BirdNET-Analyzer performs better in some species. However, for a few species it performs worse than BirdNET-Lite.

7. Which BirdNET-model should I choose ?

As discussed in the last section, BirdNET-Analyzer and BirdNET-Lite do behave differently for different species, so it is not easy to give a general recommendation. However, BirdNET-Analyzer works in a different way internally. It creates a species list based on your lat and lon coordinates. The basis are eBird distribution data. The parameter „Species Occurence Frequency Threshold“ in the Settings can be set for a larger species list (lower value = lower probability threshold of this species to occur in your region) or a smaller regional species set (higher value = higher probability threshold of the species to occur in your region). Finally, that regional species list is used in „post-processing“ and filters your classification results using this list, giving you only those species that are highly likely to occur in your region according to eBird distribution data. Of course, this prevents from identifying an unusual species that would normally not occur in your region. BirdNET-Lite also has this kind of filtering, but works differently internally and you cannot tune the regional threshold parameter.

In summary, from my understanding and experience with the two models, I would recommend the following:

  • Use BirdNET-Analyzer, if you want your BirdNET-Pi to be as specific as possible (higher specificity == higher True Positive Rate) and you want to eliminate False Positives as much as you can (adjust this with the „Species Occurence Frequency Threshold“ in the Settings)
  • Use BirdNET-Lite model, if you want your BirdNET-Pi to be more sensitive for uncommon species, but at the same time you accept many more False Positives (higher sensitivity == lower False Negative Rate). If you are outside North America and outside Europe, you probably have no choice, but you have to use BirdNET-Lite, because it has twice as much species covered!

So, it is up to you to choose which model you would like to use. In both models, you should adjust the model for your needs with the Settings of „Sigmoid Sensitivity“ (higher value == higher sensitivity == lower number of False Negative, but higher number of False Positives) and „Minimum Confidence“ (higher value == higher specificity == lower number of False Positives, but higher number of False Negatives).

8. Are there better models available than the ones that BirdNET-Pi uses ?

Sure. It all depends on your purpose and the goal you want to achieve. For example: for NocMig (the detection of nocturnally migrating birds) there are surely better models in development (see HERE or HERE for an example). However, these are often specifically developed for certain regions and do not work very well or do not work at all in other regions. There are also approaches to take existing models and finetune / train them with locally produced recordings. This trains the model with specific features of the landscape / anthropogenic noise, the noise characteristics of the specific model of the recorder used etc. All these factors contribute to the models´ performance. It has been shown that this local finetuning can significantly improve the performance. However a model that has been finetuned for a specific region is expected to perform much worse in other regions, so it should only be used locally!

(c) Put together by Frank Dziock, DD4WH, all errors are mine.