-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Architecture description of BirdNET v.2.4 #177
Comments
Small update. I just now found that part of what I am asking for is in fact in the technical description in the README:
I had missed this prior to creating this issue, and this is very helpful. But, it would still be interesting to know some of the specific choices and if there are any important differences to the original EfficientNetB0 backbone. Personally, I am interested in knowing whether or not global pooling is used in BirdNET v2.4? In most EfficientNetB0 implementations this is an option. |
I am also interested in the details required to reproduce the BirdNet model from scratch, such as a list of samples, a train-test split, and the code for the model. Are these available somewhere that I might be missing? On an adjacent note, I noticed an attempt to convert the weights into a PyTorch model, but it seems they gave up on it. You can find more information at this repo issue: dsgt-birdclef/birdclef-2023#29 Having the model code could be helpful in achieving that goal. |
We can't easily publish the training code as it is very complex with many nuances and most of all: messy :) We'll do a better job in the future, we're planning on releasing a repo with the full implementation developed from scratch so that people can actually use it. In the meantime, would it help if we published the Keras model with its custom layers? It would be much easier to inspect and might also be easier to convert to ONNX. |
I understand that this has been a long research project, and I can imagine the complexity of the codebase. It's been a while since I used Keras, but I am happy to give it a try. As for the dataset details, are you planning to release them? |
Yes, the Keras model with the custom layers should be sufficient to resolve this issue. Personally I would like to know if global pooling is used in the model or not, and if that can be made out from the suggested solution I am happy! :) Maybe another issue should be created regarding the training procedure of the model @EnisBerk ? That was not really meant to be a part of this issue, even though I can see how they are closely related. This issue is only requires a description of the model architecture to be resolved. Thanks for the quick reply @kahst ! |
Ok, here are a few details on that:
We primarily train on focal-follow recordings from Xeno-canto and Macaulay Library (~90% of our training data), a number of soundscape datasets from BirdCLEF competitions (see this comment), and a few proprietary sources like BirdNET app data (but those are only a fraction of the training data). In case you're interested, this is our spectrogram layer implementation:
|
Thank you very much for this information! That is very helpful. |
This is really helpful. Thank you very much! |
Thanks for providing that info! It is interesting to see! |
We train on ~10 mio 3-second snippets of audio with max. 5.000 per species. The train set since V2.0 has been the same. |
In their main BirdNET paper they reference this process for segmenting bird audio from background noise - Curious to know if this segmentation process has evolved since the Ecological Informatics Paper |
Keras model availability would be a really good, I would like to switch from tflite to ONNX runtime as that would simplify my BirdNET-Go significantly as I could get rid of C library compatibility layer. |
I was just in contact with Holger Klinck at the "AI for Conservation" slack-channel regarding a detailed description of the current model v2.4 architecture. He suggested that I make an issue for this here.
I think it would be great to have this information available somewhere on the GitHub-page, or if it is already somewhere maybe a pointer to where this information could be found in the version history README or something like that.
My current understanding is that it is based on EfficientNet V2 blocks, but some more details would be great to properly understand what model I am using.
Thanks for the great work on this model!
The text was updated successfully, but these errors were encountered: