Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: German species list #4

Open
krummrey opened this issue Apr 19, 2021 · 11 comments
Open

Request: German species list #4

krummrey opened this issue Apr 19, 2021 · 11 comments

Comments

@krummrey
Copy link

I got BirdNET-Lite running and want to try to analyse a days worth of recording in my garden. The output generated is in latin and english. Do you have a german language labels.txt as in the phone app?

Also can you elaborate a little into the confidence values? Where would a "reliable" be ? at 0.5 or 0.9?

Thanks for the great work. I have so much fun "catching" new birds.

@euxoa
Copy link

euxoa commented Apr 19, 2021

I stumbled into this only two days ago. With some data processing skills, you can import German names into the model/labels.txt file, if you have a list with matching scientific vs German names available. I did that for Finnish names, in R:

library(dplyr)

labels <- readr::read_delim("labels.txt", delim="_", col_names=c("name", "ename"))
transl <- readr::read_delim("../Maailman-lintujen-suomenkieliset-nimet-20180731.txt",
			    delim="\t", locale=readr::locale(encoding="latin1"))
labels %>%
       left_join(transl %>% mutate(name=`Tieteellinen nimi`, fname=`Nimi suomeksi`)) %>%
       select(name, fname, ename) %>%
       mutate(cname=ifelse(is.na(fname), ename, fname)) %>%
       mutate(line=paste(name, cname, sep="_")) %>%
       { paste(.$line, collapse="\n")} %>% {writeLines(., con="labels2.txt") }

Just keep the format as name1_name2 for each line, and the positions of lines in the file intact.

What comes to reliability, I don't think you get complete reliability with any cutoff. I have used the default 0.1 and filtered out lines with reliability above 0.7 or 0.9 from the result file. Then if some of these are interesting, I look at the spectrogram in Audacity, and listen to it, to confirm or reject.

But the result file with cutoff of 0.1 is useful as well! For birds tend to have continuity, and typically the same bird appears in successive or almost successive frames. If you get something unexpected on the high-reliability list, look at the context around that frame in the file with cutoff=0.1. Often you find the correct id from the frames nearby (and it is a common bird).

Having been with this only a couple of nights, my workflow is definitely not final yet.

That said, I have already found interesting stuff from last year's records. Using BirdNET seems to have potential to speed up browsing of WAV files, and to reveal things that otherwise go unnoticed. Just don't take the ids at face value. They need to be confirmed somehow.

I think the identifications would be better if there were some continuity of scores over successive frames. It's not build into the current model, but maybe one can bring some of it there by post-processing. By looking at the source of analyze.py it would be best to take the whole score vector over several (say 3–10) frames and smooth it somehow, maybe by convolution in logit space. Or a Markov model. I just don't currently have a ground truth against which to optimise the required smoothness parameter. ;) A crude way is to manually look for identical labels in many subsequent frames. Have to think about this, although probably I wouldn't have time to code anything..

@patlevin
Copy link

patlevin commented Jul 26, 2021

If you are still interested, I have compiled a list of localised labels for 29 languages with varying levels of completeness:

Language Missing labels Missing labels (%)
Afrikaans 5774 90.76%
Catalan 544 8.55%
Chinese 264 4.15%
Chinese (Traditional) 295 4.64%
Croatian 370 5.82%
Czech 683 10.74%
Danish 460 7.23%
Dutch 264 4.15%
Estonian 3171 49.84%
Finnish 518 8.14%
French 264 4.15%
German 264 4.15%
Hungarian 2688 42.25%
Icelandic 5588 87.83%
Indonesian 5550 87.24%
Italian 524 8.24%
Japanese 640 10.06%
Latvian 4821 75.78%
Lithuanian 597 9.38%
Northern Sami 5605 88.10%
Norwegian 325 5.11%
Polish 265 4.17%
Portuguese 2742 43.10%
Russian 808 12.70%
Slovak 264 4.15%
Slovenian 5532 86.95%
Spanish 348 5.47%
Swedish 264 4.15%
Thai 5580 87.71%
Ukrainian 646 10.15%

My localisation database is missing 264 entries that are found in labels.txt hence no language has 0% missing entries.
I have attached the localised labels (formatted as label_<ISO 639-1>.txt).

If there's any interest in this, I can create a pull-request as well.

UPDATED labels_l18n.zip

EDIT: the previous files didn't match the original labels.txt which lead to problems. The above link contains the fixed files.

@nilspupils
Copy link

That is great! Thank you for sharing!

@ghost
Copy link

ghost commented Oct 5, 2021

Is it possible to get a full label.txt list for German with all entries ?

I tried the incomplete list and run into an error: see here --> #11 , while getting different results for the latin names.

@patlevin
Copy link

patlevin commented Oct 5, 2021

@Christoph-Lauer

Is it possible to get a full label.txt list for German with all entries ?

I tried the incomplete list and run into an error: see here --> #11 , while getting different results for the latin names.

I will regenerate the labels so they match the original labels.txt. The names are definitely correct, the order, however, might not be. I'll fix that right away.

@ghost
Copy link

ghost commented Oct 5, 2021

Can confirm that the names are correct, but the number of lines different in both files. Would be happy to get a labels.txt file with the same number of lines as the TF model (6362 lines).

@patlevin
Copy link

patlevin commented Oct 5, 2021

@Christoph-Lauer I updated the archive in my comment above with the fixed files.

Also: korrigierte Liste mit deutschen Namen

Let me know if that fixes things, I'll redo the name mapping otherwise.

@ghost
Copy link

ghost commented Oct 5, 2021

YOU MAKE MY DAY ;-)

@nilspupils
Copy link

Good news! Thanks @Christoph-Lauer for checking and @patlevin for setting up the new list!

@DD4WH
Copy link

DD4WH commented Dec 7, 2021

Thanks a lot for the German Names species list!
I made a few minor corrections and added a few names which were still in English (eg. some of the warblers).

labels_de.txt

@nilspupils
Copy link

This is a german language list of only the european species. Thanks to @DD4WH for his lists which i compiled to this one. Please check for errors as this was a really messy Excel job.....
labels_de_europe.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants